Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/TCP/GTEST: Protect against connection from non-UCX sock-based app #4612

Merged
merged 1 commit into from
Jan 9, 2020

Conversation

dmitrygx
Copy link
Member

What

Protect UCT/TCP listener against unexpected connections from non-UCX socked-based applications.

Why ?

Fixes #4525 (master branch):
UCT/TCP accepts a connection from some client isn't UCT/TCP and tries to receive data from it, but a client violates UCT/TCP AM protocol.

How ?

  1. UCT/TCP client: always send the first message that includes CRC16(EP's peer address) after a connection was established successfully.
  2. UCT/TCP server: after some connection was accepted, do not move new EP to a main thread. Wait for the first message with CRC16(iface's listener address) and then move to the main thread.

@dmitrygx
Copy link
Member Author

@akesandgren could test this solution pls?

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 1 of 25 workers (click for details)

Note: the logs will be deleted after 31-Dec-2019

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ ABORTED
hpc-arm-cavium-jenkins_W1 ❓ ABORTED
hpc-arm-cavium-jenkins_W2 ❓ ABORTED
hpc-arm-cavium-jenkins_W3 ❓ ABORTED
hpc-arm-hwi-jenkins_W0 ❓ ABORTED
hpc-arm-hwi-jenkins_W1 ❓ ABORTED
hpc-arm-hwi-jenkins_W2 ❓ ABORTED
hpc-arm-hwi-jenkins_W3 ❓ ABORTED
hpc-test-node-gpu_W1 ❓ ABORTED
hpc-test-node-gpu_W2 ❓ ABORTED
hpc-test-node-gpu_W3 ❓ ABORTED
hpc-test-node-legacy_W0 ❓ ABORTED
hpc-test-node-legacy_W3 ❓ ABORTED
hpc-test-node-new_W0 ❓ ABORTED
hpc-test-node-new_W1 ❓ ABORTED
hpc-test-node-new_W2 ❓ ABORTED
hpc-test-node-new_W3 ❓ ABORTED
hpc-test-node-gpu_W0 ❌ FAILURE
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS
hpc-test-node-legacy_W1 ❓ UNKNOWN
hpc-test-node-legacy_W2 ❓ UNKNOWN

@akesandgren
Copy link

Tests running

@dmitrygx
Copy link
Member Author

Tests running

@akesandgren appreciate your help with the testing of my patches! thank you!

test/gtest/uct/uct_test.h Outdated Show resolved Hide resolved
test/gtest/uct/tcp/test_tcp.cc Outdated Show resolved Hide resolved
test/gtest/uct/tcp/test_tcp.cc Outdated Show resolved Hide resolved
src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 5 of 25 workers (click for details)

Note: the logs will be deleted after 31-Dec-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ❌ FAILURE
hpc-arm-hwi-jenkins_W0 ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-test-node-new_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@akesandgren
Copy link

Kicked off two new test runs with that last commit. Two also running with the code prior to it.

@dmitrygx
Copy link
Member Author

Kicked off two new test runs with that last commit. Two also running with the code prior to it.

ok, thank you for keeping me updated!

@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 25 workers (click for details)

Note: the logs will be deleted after 01-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ ABORTED
hpc-arm-cavium-jenkins_W2 ❓ ABORTED
hpc-arm-cavium-jenkins_W3 ❓ ABORTED
hpc-arm-hwi-jenkins_W0 ❓ ABORTED
hpc-test-node-gpu_W0 ❓ ABORTED
hpc-test-node-gpu_W1 ❓ ABORTED
hpc-test-node-gpu_W3 ❓ ABORTED
hpc-test-node-legacy_W0 ❓ ABORTED
hpc-test-node-legacy_W2 ❓ ABORTED
hpc-test-node-new_W0 ❓ ABORTED
hpc-test-node-new_W1 ❓ ABORTED
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ❓ UNKNOWN
hpc-test-node-gpu_W2 ❓ UNKNOWN
hpc-test-node-legacy_W1 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN
hpc-test-node-new_W3 ❓ UNKNOWN

@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 25 workers (click for details)

Note: the logs will be deleted after 01-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W1 ❓ ABORTED
hpc-arm-cavium-jenkins_W2 ❓ ABORTED
hpc-test-node-gpu_W0 ❓ ABORTED
hpc-test-node-gpu_W1 ❓ ABORTED
hpc-test-node-gpu_W2 ❓ ABORTED
hpc-test-node-legacy_W1 ❓ ABORTED
hpc-test-node-new_W0 ❓ ABORTED
hpc-test-node-new_W1 ❓ ABORTED
hpc-test-node-new_W3 ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 3 of 25 workers (click for details)

Note: the logs will be deleted after 01-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
r-vmb-ppc-jenkins_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

lab configuration issue and #4616
bot:mlx:retest

@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 25 workers (click for details)

Note: the logs will be deleted after 02-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W1 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W2 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W3 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W0 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W1 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W2 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W3 ❓ UNKNOWN
hpc-test-node-gpu_W0 ❓ UNKNOWN
hpc-test-node-gpu_W1 ❓ UNKNOWN
hpc-test-node-gpu_W2 ❓ UNKNOWN
hpc-test-node-gpu_W3 ❓ UNKNOWN
hpc-test-node-legacy_W0 ❓ UNKNOWN
hpc-test-node-legacy_W1 ❓ UNKNOWN
hpc-test-node-legacy_W2 ❓ UNKNOWN
hpc-test-node-legacy_W3 ❓ UNKNOWN
hpc-test-node-new_W0 ❓ UNKNOWN
hpc-test-node-new_W1 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN
hpc-test-node-new_W3 ❓ UNKNOWN
r-vmb-ppc-jenkins_W0 ❓ UNKNOWN
r-vmb-ppc-jenkins_W1 ❓ UNKNOWN
r-vmb-ppc-jenkins_W2 ❓ UNKNOWN
r-vmb-ppc-jenkins_W3 ❓ UNKNOWN

@dmitrygx
Copy link
Member Author

@brminich could you review when you get chance pls?

src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
src/uct/tcp/tcp_ep.c Outdated Show resolved Hide resolved
@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 25 workers (click for details)

Note: the logs will be deleted after 02-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-test-node-legacy_W1 ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ❓ UNKNOWN

@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 25 workers (click for details)

Note: the logs will be deleted after 02-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-test-node-legacy_W1 ❓ ABORTED
hpc-test-node-legacy_W2 ❓ ABORTED
r-vmb-ppc-jenkins_W0 ❓ ABORTED
r-vmb-ppc-jenkins_W3 ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W1 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W2 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W3 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W0 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W1 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W2 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W3 ❓ UNKNOWN
hpc-test-node-gpu_W0 ❓ UNKNOWN
hpc-test-node-gpu_W1 ❓ UNKNOWN
hpc-test-node-gpu_W2 ❓ UNKNOWN
hpc-test-node-gpu_W3 ❓ UNKNOWN
hpc-test-node-legacy_W0 ❓ UNKNOWN
hpc-test-node-legacy_W3 ❓ UNKNOWN
hpc-test-node-new_W0 ❓ UNKNOWN
hpc-test-node-new_W1 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN
hpc-test-node-new_W3 ❓ UNKNOWN
r-vmb-ppc-jenkins_W1 ❓ UNKNOWN
r-vmb-ppc-jenkins_W2 ❓ UNKNOWN

@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 02-Jan-2020

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@@ -21,6 +21,8 @@

#define UCT_TCP_CONFIG_PREFIX "TCP_"

#define UCT_TCP_MAGIC_NUMBER 0xCAFEBABElu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better define a full 64 bit constant, or use it as 32-bit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined it as const uint64_t

ucs_error("tcp_ep %p: connection establishment for "
"socket fd %d was unsuccessful", ep, ep->fd);
goto err;
status = uct_tcp_cm_send_magic_number(ep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we add just magic number to uct_tcp_cm_conn_req_pkt for simplicity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we can't, since uct_tcp_cm_conn_req_pkt is a part of parsing TCP AM packet:

| TCP AM header | payload |

where CONN req is transferred as a payload

but I move the logic responsible for sending magic number before all data to uct_tcp_cm_send_event() - so, it simplifies the code

}

void test_listener_flood(entity& test_entity, size_t max_conn,
size_t msg_size = 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need default value for msg_size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@yosefe
Copy link
Contributor

yosefe commented Dec 27, 2019

[2019-12-27T13:28:09.785Z] "/scrap/jenkins/workspace/ucx-5/contrib/../src/uct/tcp/tcp_cm.c", line 182:
[2019-12-27T13:28:09.785Z]           error #1143: arithmetic on pointer to void or function type
[2019-12-27T13:28:09.785Z]       pkt_hdr         = (uct_tcp_am_hdr_t*)(pkt_buf + magic_number_length);
[2019-12-27T13:28:09.785Z]                                                     ^

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 5 of 25 workers (click for details)

Note: the logs will be deleted after 03-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-legacy_W0 ❌ FAILURE
hpc-test-node-legacy_W2 ❌ FAILURE
hpc-test-node-new_W0 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 3 of 25 workers (click for details)

Note: the logs will be deleted after 04-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-gpu_W2 ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

[2019-12-27T13:28:09.785Z] "/scrap/jenkins/workspace/ucx-5/contrib/../src/uct/tcp/tcp_cm.c", line 182:
[2019-12-27T13:28:09.785Z] error #1143: arithmetic on pointer to void or function type
[2019-12-27T13:28:09.785Z] pkt_hdr = (uct_tcp_am_hdr_t*)(pkt_buf + magic_number_length);
[2019-12-27T13:28:09.785Z] ^

@yosefe fixed two static analyzer issues in the 3rd commit

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 2 of 25 workers (click for details)

Note: the logs will be deleted after 04-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-new_W1 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 2 of 25 workers (click for details)

Note: the logs will be deleted after 04-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

@akesandgren do you have any news regarding testing with this patch?

@yosefe
Copy link
Contributor

yosefe commented Dec 31, 2019

bot:pipe:retest

@akesandgren
Copy link

No crashes with commit 23f0ece after ~7 days. Haven't tried with anything newer yet.

Wan't me to try with the current head?

@dmitrygx
Copy link
Member Author

dmitrygx commented Jan 1, 2020

No crashes with commit 23f0ece after ~7 days. Haven't tried with anything newer yet.

Wan't me to try with the current head?

@akesandgren good news. thank you!
I don't think that we need to tests the current head as it doesn't change fix, it just applies fixes for the comments from UCX community. I think we need to pick up the fix to 1.7.0 release, right?

src/uct/tcp/tcp.h Outdated Show resolved Hide resolved
src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
src/uct/tcp/tcp_cm.c Outdated Show resolved Hide resolved
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 12 of 25 workers (click for details)

Note: the logs will be deleted after 09-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W3 ❌ FAILURE
hpc-arm-hwi-jenkins_W2 ❌ FAILURE
hpc-arm-hwi-jenkins_W3 ❌ FAILURE
hpc-test-node-gpu_W1 ❌ FAILURE
hpc-test-node-gpu_W2 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-test-node-new_W2 ❌ FAILURE
hpc-test-node-new_W3 ❌ FAILURE
r-vmb-ppc-jenkins_W1 ❌ FAILURE
r-vmb-ppc-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@mellanox-github
Copy link
Contributor

Mellanox CI: ABORTED on 25 workers (click for details)

Note: the logs will be deleted after 09-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ ABORTED
hpc-arm-cavium-jenkins_W1 ❓ ABORTED
hpc-arm-cavium-jenkins_W2 ❓ ABORTED
hpc-arm-cavium-jenkins_W3 ❓ ABORTED
hpc-arm-hwi-jenkins_W0 ❓ ABORTED
hpc-arm-hwi-jenkins_W1 ❓ ABORTED
hpc-arm-hwi-jenkins_W2 ❓ ABORTED
hpc-arm-hwi-jenkins_W3 ❓ ABORTED
hpc-test-node-gpu_W0 ❓ ABORTED
hpc-test-node-gpu_W1 ❓ ABORTED
hpc-test-node-gpu_W2 ❓ ABORTED
hpc-test-node-gpu_W3 ❓ ABORTED
hpc-test-node-legacy_W0 ❓ ABORTED
hpc-test-node-legacy_W1 ❓ ABORTED
hpc-test-node-legacy_W2 ❓ ABORTED
hpc-test-node-legacy_W3 ❓ ABORTED
hpc-test-node-new_W0 ❓ ABORTED
hpc-test-node-new_W1 ❓ ABORTED
hpc-test-node-new_W2 ❓ ABORTED
hpc-test-node-new_W3 ❓ ABORTED
r-vmb-ppc-jenkins_W1 ❓ ABORTED
r-vmb-ppc-jenkins_W2 ❓ ABORTED
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

src/uct/tcp/tcp_ep.c Outdated Show resolved Hide resolved
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 12 of 25 workers (click for details)

Note: the logs will be deleted after 09-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W3 ❌ FAILURE
hpc-arm-hwi-jenkins_W2 ❌ FAILURE
hpc-arm-hwi-jenkins_W3 ❌ FAILURE
hpc-test-node-gpu_W1 ❌ FAILURE
hpc-test-node-gpu_W2 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-test-node-new_W2 ❌ FAILURE
hpc-test-node-new_W3 ❌ FAILURE
r-vmb-ppc-jenkins_W1 ❌ FAILURE
r-vmb-ppc-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 25 workers (click for details)

Note: the logs will be deleted after 09-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ ABORTED
hpc-arm-cavium-jenkins_W2 ❓ ABORTED
hpc-arm-cavium-jenkins_W3 ❓ ABORTED
hpc-arm-hwi-jenkins_W0 ❓ ABORTED
hpc-arm-hwi-jenkins_W1 ❓ ABORTED
hpc-arm-hwi-jenkins_W2 ❓ ABORTED
hpc-arm-hwi-jenkins_W3 ❓ ABORTED
hpc-test-node-gpu_W0 ❓ ABORTED
hpc-test-node-gpu_W1 ❓ ABORTED
hpc-test-node-gpu_W2 ❓ ABORTED
hpc-test-node-gpu_W3 ❓ ABORTED
hpc-test-node-legacy_W2 ❓ ABORTED
hpc-test-node-legacy_W3 ❓ ABORTED
r-vmb-ppc-jenkins_W1 ❓ ABORTED
r-vmb-ppc-jenkins_W2 ❓ ABORTED
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ❓ UNKNOWN
hpc-test-node-legacy_W0 ❓ UNKNOWN
hpc-test-node-legacy_W1 ❓ UNKNOWN
hpc-test-node-new_W0 ❓ UNKNOWN
hpc-test-node-new_W1 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN
hpc-test-node-new_W3 ❓ UNKNOWN

@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 09-Jan-2020

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

dmitrygx commented Jan 4, 2020

bot:pipe:retest

@shamisp
Copy link
Contributor

shamisp commented Jan 7, 2020

I'm curious, what is typical protection in other TCP/IP based application assuming certain wire level protocols ? Shell we make the magic value user configurable, just in case it conflicts with something else ?

@akesandgren
Copy link

The main reason for the protection is against random port sniffers and the likes. I see no reason to have it configurable.

@shamisp
Copy link
Contributor

shamisp commented Jan 8, 2020

@dmitrygx how much effort is to back port this to v1.7.0 ? (this is follow up on email thread)

@dmitrygx
Copy link
Member Author

dmitrygx commented Jan 8, 2020

@dmitrygx how much effort is to back port this to v1.7.0 ? (this is follow up on email thread)

@shamisp not a problem at all, since the TCP related code in v1.7.x is almost the same as in the master branch. Need to get @yosefe’s approval and squash&merge this to master.

@shamisp
Copy link
Contributor

shamisp commented Jan 8, 2020

@yosefe can you please take a look. 👍 from my side.

@shamisp
Copy link
Contributor

shamisp commented Jan 8, 2020

BTW actually @brminich can approve this as well and it should be enough.

} else if (ep->ctx_caps & UCS_BIT(UCT_TCP_EP_CTX_TYPE_RX)) {
/* If the EP supports RX only, destroy it */
} else if ((ep->ctx_caps == 0) ||
(ep->ctx_caps & UCS_BIT(UCT_TCP_EP_CTX_TYPE_RX))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is situation when ep has no caps possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is the situation when we are trying to receive the "magic number" packet from the peer that connected to us and it fails by some reason

cm_pkt_length = sizeof(*conn_pkt);
cm_pkt_length = sizeof(*conn_pkt);

if (ep->conn_state == UCT_TCP_EP_CONN_STATE_CONNECTING) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to send UCT_TCP_CM_CONN_REQ in non UCT_TCP_EP_CONN_STATE_CONNECTING state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, when UCT/TCP EP was in CONNECTED state after that EP was created from accepting the connection and then the user wants to create new EP to the same peer (we found this EP from khash and send CONN_REQ mesage to the peer to enable RX capability on peer's side)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe always sending magic number would be simpler?

@@ -1001,6 +1017,66 @@ unsigned uct_tcp_ep_progress_put_rx(uct_tcp_ep_t *ep)
return 1;
}

static unsigned uct_tcp_ep_progress_data_rx(uct_tcp_ep_t *ep)
{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed

@shamisp
Copy link
Contributor

shamisp commented Jan 8, 2020

👍

@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 22 workers (click for details)

Note: the logs will be deleted after 15-Jan-2020

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W1 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W2 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W3 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W0 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W1 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W2 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W3 ❓ UNKNOWN
hpc-test-node-gpu_W0 ❓ UNKNOWN
hpc-test-node-gpu_W1 ❓ UNKNOWN
hpc-test-node-gpu_W2 ❓ UNKNOWN
hpc-test-node-gpu_W3 ❓ UNKNOWN
hpc-test-node-legacy_W0 ❓ UNKNOWN
hpc-test-node-legacy_W1 ❓ UNKNOWN
hpc-test-node-legacy_W2 ❓ UNKNOWN
hpc-test-node-legacy_W3 ❓ UNKNOWN
hpc-test-node-new_W0 ❓ UNKNOWN
hpc-test-node-new_W1 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN
hpc-test-node-new_W3 ❓ UNKNOWN
r-vmb-ppc-jenkins_W1 ❓ UNKNOWN

@dmitrygx
Copy link
Member Author

dmitrygx commented Jan 8, 2020

@shamisp @brminich force-pushed changes to fix the year in the new test/gtest/uct/tcp/test_tcp.cc file (2019 -> 2020) and removed trailing ;; in src/uct/tcp/tcp_cm.c:585 (;; -> ;)

@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 16-Jan-2020

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@brminich brminich merged commit b41c992 into openucx:master Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1.7.0-rc1 showing tcp_ep.c:739 Assertion hdr->length <= (iface->config.rx_seg_size - sizeof(uct_tcp_am_hdr_t))
7 participants