Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlagsyncd]Fix mlag socket read fail #1832

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

pettershao-ragilenetworks
Copy link
Contributor

@pettershao-ragilenetworks pettershao-ragilenetworks commented Jul 22, 2021

What I did
fix mclagysncd server socket read fail
Why I did it
if socket read fail, mclagsyncd can't communicate with iccpd, the whole mlag can't work
How I verified it
check the socket state
Details if related

issue description:
from below strace log, the read size for fd 34 is 0, which means the initiazation of this variable is not correct. MSG_BATCH_SIZE should be a macro, otherwise it may be inited after size and cause size 0 .

root@sonic:/home/admin# strace -f -p 6681 (only show 13(server fd)and 34(conn fd))
[pid  6681] setsockopt(13, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid  6681] setsockopt(13, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid  6681] bind(13, {sa_family=AF_INET, sin_port=htons(2626), sin_addr=inet_addr("127.0.0.6")}, 16) = 0
[pid  6681] listen(13, 2) 
[pid  6681] accept(13, {sa_family=AF_INET, sin_port=htons(40028), sin_addr=inet_addr("127.0.0.1")}, [32624->16]) = 34
[pid  6681] write(1, "Connected!\n", 11) = 11
[pid  6681] epoll_ctl(12, EPOLL_CTL_ADD, 34, {EPOLLIN, {u32=34, u64=34}}) = 0
[pid  6681] epoll_wait(12, [], 2, 0)    = 0
[pid  6681] epoll_wait(12,
[{EPOLLIN, {u32=34, u64=34}}], 2, -1) = 1
[pid  6681] read(34, "", 0)             = 0   <<====here size is empty, which is confirmed in log, both m_bufsize and m_pos is 0.
[pid  6681] sendto(9, "<13>Jul 21 06:50:44 mclagsyncd: "..., 65, MSG_NOSIGNAL, NULL, 0) = 65
[pid  6681] futex(0x7f707811d1a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid  6681] close(13)                   = 0

root@sonic:/home/admin# netstat -ap | grep 2626
tcp        0      0 127.0.0.6:2626          0.0.0.0:*               LISTEN      4817/mclagsyncd
tcp        946    0 127.0.0.6:2626          localhost:33166         ESTABLISHED 6681/mclagsyncd     //receive queue is not empty since read failed
tcp        0      0 localhost:33166         127.0.0.6:2626          ESTABLISHED 6682/iccpd


below show a normal read strace log:

[pid 15027] read(34, "\1\2\24\0\3\0\f\0test_sync_fd", 1048576) = 20

@pettershao-ragilenetworks
Copy link
Contributor Author

pettershao-ragilenetworks commented Jul 22, 2021

@lguohan @qiluo-msft @Praveen-Brcm help review this, is a sometime issue, thanks!

@Praveen-Brcm
Copy link
Contributor

@lguohan @qiluo-msft @Praveen-Brcm help review this, is a sometime issue, thanks!

@pettershao-ragilenetworks Thanks for the change. Am not added as reviwer, if someone can add me will provide approals from my end.
Thanks- Praveen.

@pettershao-ragilenetworks
Copy link
Contributor Author

@lguohan @qiluo-msft help forward this, thanks!

@@ -204,7 +204,6 @@ namespace swss {
std::string m_system_mac;
std::set<vlan_mbr> m_vlan_mbrship; //set of vlan,mbr tuples

const int MSG_BATCH_SIZE;
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const int MSG_BATCH_SIZE;
static const int MSG_BATCH_SIZE = 256;
``` #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fix is easier than defining as a macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, accepted.

@pettershao-ragilenetworks
Copy link
Contributor Author

@lguohan help merge this, thanks!

@pettershao-ragilenetworks
Copy link
Contributor Author

@lguohan @qiluo-msft help merge this, thanks!

@pettershao-ragilenetworks
Copy link
Contributor Author

@lguohan @qiluo-msft help forward this, thanks!

@qiluo-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@pettershao-ragilenetworks
Copy link
Contributor Author

@lguohan @qiluo-msft help forward this, thanks!

@qiluo-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@pettershao-ragilenetworks
Copy link
Contributor Author

pettershao-ragilenetworks commented Jan 19, 2022

@lguohan @qiluo-msft As it is approved long time ago, help merge this, thanks!

@msosyak
Copy link
Contributor

msosyak commented Jan 24, 2022

@pettershao-ragilenetworks The issue you tried to fix here is already fixed in #2112 by fixing the initialization order.

EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
)

- What I did
Add NVIDIA Copyright header to "mellanox" files

- How I did it
Add NVIDIA Copyright header as a comment for Mellanox files

- How to verify it
Basic checkers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants