Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure in shm/test_ucp_rma.blocking_small/0 #1977

Closed
alinask opened this issue Nov 6, 2017 · 3 comments
Closed

failure in shm/test_ucp_rma.blocking_small/0 #1977

alinask opened this issue Nov 6, 2017 · 3 comments

Comments

@alinask
Copy link
Contributor

alinask commented Nov 6, 2017

09:57:08 [----------] 9 tests from shm/test_ucp_rma
09:57:08 [ RUN      ] shm/test_ucp_rma.blocking_small/0
10:02:34 [1509955354.305137] [clx-ppc-04:28500:0]     ucp_worker.c:398  UCX  ERROR Error Endpoint timeout was not handled for ep 0x10037b5cd60 - rc/mlx4_0:1
10:02:34 /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/ucp/test_ucp_rma.cc:53: Failure
10:02:34 Error: Endpoint timeout
10:02:34 [1509955354.333524] [clx-ppc-04:28500:0]         rcache.c:300  UCX  WARN  mlx4_0: destroying inuse region 0x10037b5ce40 [0x3fff89c10000..0x3fff89c20000] g- rw ref 1 lkey 0xb00115f8 rkey 0xb00115f8 atomic: lkey 0xffffffff rkey 0x
10:02:34 [clx-ppc-04:28500:0]      rcache.c:153  Assertion `region->refcount == 0' failed
10:02:34 
10:02:34 /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucs/sys/rcache.c: [ ucs_mem_region_destroy_internal() ]
10:02:34       ...
10:02:34       150     ucs_rcache_region_trace(rcache, region, "destroy");
10:02:34       151 
10:02:34       152     ucs_assert(region->refcount == 0);
10:02:34 ==>   153     ucs_assert(!(region->flags & UCS_RCACHE_REGION_FLAG_PGTABLE));
10:02:34       154 
10:02:34       155     if (region->flags & UCS_RCACHE_REGION_FLAG_REGISTERED) {
10:02:34       156         UCS_PROFILE_CODE("mem_dereg") {
10:02:34 
10:02:35 ==== backtrace ====
10:02:35  0 0x00000000000585fc ucs_mem_region_destroy_internal()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucs/sys/rcache.c:153
10:02:35  1 0x00000000000585fc ucs_rcache_purge()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucs/sys/rcache.c:302
10:02:35  2 0x00000000000585fc ucs_rcache_t_cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucs/sys/rcache.c:629
10:02:35  3 0x000000000005ebf8 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucs/type/class.c:50
10:02:35  4 0x000000000005a848 ucs_rcache_destroy()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucs/sys/rcache.c:642
10:02:35  5 0x000000000002d000 uct_ib_md_close()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/uct/ib/base/ib_md.c:1286
10:02:35  6 0x000000000001e220 uct_md_close()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/uct/base/uct_md.c:125
10:02:35  7 0x0000000000016240 ucp_free_resources()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucp/core/ucp_context.c:438
10:02:35  8 0x0000000000016240 ucp_cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../src/ucp/core/ucp_context.c:887
10:02:35  9 0x000000001030169c ucs::handle<ucp_context*, void*>::release()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/test_helpers.h:373
10:02:35 10 0x000000001030169c ucs::handle<ucp_context*, void*>::reset()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/test_helpers.h:312
10:02:35 11 0x000000001030169c ~handle()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/test_helpers.h:307
10:02:35 12 0x000000001030169c ucp_test_base::entity::~entity()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/ucp/ucp_test.cc:335
10:02:35 13 0x0000000010301bd4 ucs::ptr_vector_base<ucp_test_base::entity>::release()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/test_helpers.h:240
10:02:35 14 0x0000000010301bd4 ucs::ptr_vector_base<ucp_test_base::entity>::clear()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/test_helpers.h:211
10:02:35 15 0x0000000010301bd4 ucp_test::cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/ucp/ucp_test.cc:59
10:02:35 16 0x000000001009d6f8 ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/test.cc:226
10:02:35 17 0x00000000101df218 ucp_test::TearDown()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/ucp/ucp_test.h:104
10:02:35 18 0x0000000010091974 HandleSehExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:3562
10:02:35 19 0x0000000010091974 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:3598
10:02:35 20 0x0000000010081384 testing::Test::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:3643
10:02:35 21 0x000000001008152c testing::TestInfo::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:3812
10:02:35 22 0x0000000010081778 testing::TestCase::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:3930
10:02:35 23 0x0000000010087598 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:5802
10:02:35 24 0x00000000100879cc testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:5719
10:02:35 25 0x00000000100879cc HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:3562
10:02:35 26 0x00000000100879cc HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:3598
10:02:35 27 0x00000000100879cc testing::UnitTest::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest-all.cc:5416
10:02:35 28 0x0000000010020c5c RUN_ALL_TESTS()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/gtest.h:20059
10:02:35 29 0x0000000010020c5c main()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/3/contrib/../test/gtest/common/main.cc:77
10:02:35 30 0x0000000000024580 generic_start_main.isra.0()  libc-start.c:0
10:02:35 31 0x0000000000024774 __libc_start_main()  ???:0
10:02:35 ===================

http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5071/label=clx-ppc-04.mtl.labs.mlnx,worker=3/console

(on ppc)

@yosefe
Copy link
Contributor

yosefe commented Dec 4, 2017

http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/label=clx-ppc-04.mtl.labs.mlnx,worker=0/5347/console

01:28:23 [ RUN      ] udrc/test_ucp_tag_xfer.send_generic_recv_contig_exp_rndv/2
01:28:23 [1512430103.646255] [clx-ppc-04:112220:0]         rcache.c:300  UCX  WARN  mlx5_0: destroying inuse region 0x1002ae91b40 [0x1002a82ffb0..0x1002a948630] g- rw ref 1 lkey 0x566b2 rkey 0x566b2 atomic: lkey 0xffffffff rkey 0xffffff
01:28:23 [clx-ppc-04:112220:0]      rcache.c:153  Assertion `region->refcount == 0' failed
01:28:23 
01:28:23 /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c: [ ucs_mem_region_destroy_internal() ]
01:28:23       ...
01:28:23       150     ucs_rcache_region_trace(rcache, region, "destroy");
01:28:23       151 
01:28:23       152     ucs_assert(region->refcount == 0);
01:28:23 ==>   153     ucs_assert(!(region->flags & UCS_RCACHE_REGION_FLAG_PGTABLE));
01:28:23       154 
01:28:23       155     if (region->flags & UCS_RCACHE_REGION_FLAG_REGISTERED) {
01:28:23       156         UCS_PROFILE_CODE("mem_dereg") {
01:28:23 
01:28:24 ==== backtrace ====
01:28:24  0 0x000000000005863c ucs_mem_region_destroy_internal()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:153
01:28:24  1 0x000000000005863c ucs_rcache_purge()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:302
01:28:24  2 0x000000000005863c ucs_rcache_t_cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:629
01:28:24  3 0x000000000005ec78 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/type/class.c:50
01:28:24  4 0x000000000005a888 ucs_rcache_destroy()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:642
01:28:24  5 0x000000000002ef10 uct_ib_md_close()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/uct/ib/base/ib_md.c:1286
01:28:24  6 0x000000000001efe0 uct_md_close()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/uct/base/uct_md.c:125
01:28:24  7 0x0000000000017100 ucp_free_resources()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucp/core/ucp_context.c:483
01:28:24  8 0x0000000000017100 ucp_cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucp/core/ucp_context.c:950
01:28:24  9 0x000000001031395c ucs::handle<ucp_context*, void*>::release()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:380
01:28:24 10 0x000000001031395c ucs::handle<ucp_context*, void*>::reset()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:319
01:28:24 11 0x000000001031395c ~handle()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:314
01:28:24 12 0x000000001031395c ucp_test_base::entity::~entity()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.cc:335
01:28:24 13 0x0000000010313e94 ucs::ptr_vector_base<ucp_test_base::entity>::release()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:247
01:28:24 14 0x0000000010313e94 ucs::ptr_vector_base<ucp_test_base::entity>::clear()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:218
01:28:24 15 0x0000000010313e94 ucp_test::cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.cc:59
01:28:24 16 0x000000001009f9a8 ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test.cc:226
01:28:24 17 0x00000000101e5828 ucp_test::TearDown()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.h:104
01:28:24 18 0x00000000100935b4 HandleSehExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3562
01:28:24 19 0x00000000100935b4 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3598
01:28:24 20 0x0000000010082fc4 testing::Test::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3643
01:28:24 21 0x000000001008316c testing::TestInfo::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3812
01:28:24 22 0x00000000100833b8 testing::TestCase::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3930
01:28:24 23 0x00000000100891d8 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5802
01:28:24 24 0x000000001008960c testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5719
01:28:24 25 0x000000001008960c HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3562
01:28:24 26 0x000000001008960c HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3598
01:28:24 27 0x000000001008960c testing::UnitTest::Run()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5416
01:28:24 28 0x000000001002147c RUN_ALL_TESTS()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest.h:20059
01:28:24 29 0x000000001002147c main()  /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/main.cc:77
01:28:24 30 0x0000000000024580 generic_start_main.isra.0()  libc-start.c:0
01:28:24 31 0x0000000000024774 __libc_start_main()  ???:0
01:28:24 ===================
01:28:24 Sending notification to yosefe@mellanox.com
01:28:27 [clx-ppc-04:112220:0] Process frozen...

@yosefe
Copy link
Contributor

yosefe commented Dec 4, 2017

(gdb) bt
#0  0x00003fff90cf2de0 in pause () from /usr/lib64/power8/libpthread.so.0
#1  0x00003fff9102cf64 in ucs_debug_freeze () at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/debug/debug.c:709
#2  0x00003fff9102f120 in ucs_error_freeze (message=0x3fffe50fee50 "     rcache.c:153  Assertion `region->refcount == 0' failed", 
    error_type=0x3fff9113f070 "assertion failure") at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/debug/debug.c:828
#3  ucs_handle_error (error_type=0x3fff9113f070 "assertion failure", message=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/debug/debug.c:992
#4  0x00003fff91030a50 in __ucs_abort (error_type=0x3fff9113f070 "assertion failure", file=<optimized out>, line=<optimized out>, function=<optimized out>, 
    message=<optimized out>) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/debug/log.c:228
#5  0x00003fff9103863c in ucs_mem_region_destroy_internal (region=<optimized out>, rcache=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:153
#6  ucs_rcache_purge (rcache=0x1002ac9cc90) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:302
#7  ucs_rcache_t_cleanup (self=0x1002ac9cc90) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:629
#8  0x00003fff9103ec78 in ucs_class_call_cleanup_chain (cls=<optimized out>, obj=0x1002ac9cc90, limit=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/type/class.c:50
#9  0x00003fff9103a888 in ucs_rcache_destroy (self=0x1002ac9cc90)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:642
#10 0x00003fff90f2ef10 in uct_ib_md_close (uct_md=0x1002a5696d0)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/uct/ib/base/ib_md.c:1286
#11 0x00003fff90f1efe0 in uct_md_close (md=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/uct/base/uct_md.c:125
#12 0x00003fff90e37100 in ucp_free_resources (context=0x1002a4f8be0)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucp/core/ucp_context.c:483
#13 ucp_cleanup (context=0x1002a4f8be0) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucp/core/ucp_context.c:950
#14 0x000000001031395c in release (this=0x1002a5934f0)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:380
#15 reset (this=0x1002a5934f0) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:319
#16 ~handle (this=0x1002a5934f0, __in_chrg=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:314
#17 ucp_test_base::entity::~entity (this=0x1002a5934f0, __in_chrg=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.cc:335
#18 0x0000000010313e94 in release (this=<optimized out>, ptr=0x1002a5934f0)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:247
#19 clear (this=<optimized out>) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:218
#20 ucp_test::cleanup (this=0x1002a4f0430) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.cc:59
#21 0x000000001009f9a8 in ucs::test_base::TearDownProxy (this=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test.cc:226
#22 0x00000000101e5828 in ucp_test::TearDown (this=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.h:104
---Type <return> to continue, or q <return> to quit---
#23 0x00000000100935b4 in HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x1049cef8 "TearDown()", method=<optimized out>, object=0x1002a4f0490)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3562
#24 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x1002a4f0490, method=<optimized out>, location=0x1049cef8 "TearDown()")
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3598
#25 0x0000000010082fc4 in testing::Test::Run (this=0x1002a4f0490)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3643
#26 0x000000001008316c in testing::TestInfo::Run (this=0x1002a2d5e50)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3812
#27 0x00000000100833b8 in testing::TestCase::Run (this=0x1002a1a5100)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3930
#28 0x00000000100891d8 in testing::internal::UnitTestImpl::RunAllTests (this=0x10029eb13c0)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5802
#29 0x000000001008960c in RunAllTests (this=0x10029eb13c0)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5719
#30 HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=<optimized out>, method=<optimized out>, object=0x10029eb13c0)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3562
#31 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x1049d3c8 "auxiliary test code (environments or event listeners)", 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x10089500 <testing::internal::UnitTestImpl::RunAllTests()>, 
    object=0x10029eb13c0) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3598
#32 testing::UnitTest::Run (this=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5416
#33 0x000000001002147c in RUN_ALL_TESTS () at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest.h:20059
#34 main (argc=1, argv=<optimized out>) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/main.cc:77
(gdb) f 5
#5  0x00003fff9103863c in ucs_mem_region_destroy_internal (region=<optimized out>, rcache=<optimized out>)
    at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:153
153	    ucs_assert(region->refcount == 0);
(gdb) f 6
#6  ucs_rcache_purge (rcache=0x1002ac9cc90) at /scrap/jenkins/workspace/hpc-ucx-pr/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:302
302	        ucs_mem_region_destroy_internal(rcache, region);
(gdb) p *region
$4 = {super = {start = 1100224855984, end = 1100226004528}, list = {prev = 0x3fffe50ff7c0, next = 0x3fffe50ff7c0}, refcount = 1, status = UCS_OK, prot = 3 '\003', 
  flags = 1}

@alinask
Copy link
Contributor Author

alinask commented Apr 24, 2018

Wasn't reproduced for a long time.

@alinask alinask closed this as completed Apr 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants