Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/buffersize Causing high memory usage #354

Merged
merged 4 commits into from
May 30, 2020

Conversation

x00b
Copy link
Contributor

@x00b x00b commented May 5, 2020

Websocket buffer size was causing high memory usage during ws conn, issues tracking:

Suggest in #282 , discussion in #328 and benchmark/testing in #350

ps: re-opening pr

@beshoo
Copy link
Collaborator

beshoo commented May 8, 2020

@x00b
I note that the total allocate memory is growing but slowly....
Is that can be managed by GOGC?

Have a look
There is 2 session hooked to the api.

I Started with :
Alloc = 3 MiB # TotalAlloc = 21 MiB # Sys = 15 MiB # NumGC = 10
After 7 hours
Alloc = 5 MiB # TotalAlloc = 35 MiB Sys = 17 MiB # NumGC = 205

Now i made a lot of QR requests, arround 300 request then i viewed the memory

Alloc = 23 MiB��TotalAlloc = 80 MiB�����Sys = 38 MiB���� # NumGC = 223�
Moreover in "top" the cpu and mem is zero.

What do you think?
But keep in mind that without this update l guarantee more than 7 restarts for the api.

@x00b
Copy link
Contributor Author

x00b commented May 8, 2020

The default GC behavior can cause this unnecessary spikes, i think it might be related to golang/go#14812 and maybe https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2/ too.

I know that this PR really fix the high memory usage but the GC might be necessary to control directly in the application. But lets keep investigating it.

@beshoo
Copy link
Collaborator

beshoo commented May 8, 2020

Do you have any idea or method of controlling GC in the application?

@gabstv
Copy link
Contributor

gabstv commented May 8, 2020

I went to gorilla/websocket to check what goes when the size is set to zero:

const (
    defaultReadBufferSize  = 4096
    defaultWriteBufferSize = 4096
)

@beshoo
Copy link
Collaborator

beshoo commented May 8, 2020

@gabstv Than you for your reply, I am not sure I followed you correctly.
We already changed these values to zero in order to limit down the high memory usage!

So what is the purpose of adding 4096 value?

@x00b
Copy link
Contributor Author

x00b commented May 9, 2020

When we set buffer size to zero gorilla/websocket change it to 4096, that's what he said.

But the default value is working fine. There's no need to increase it like it was and i think not even changing this PR to use other value instead of zero(so websocket doesn't set a higher default size) value)

@beshoo
Copy link
Collaborator

beshoo commented May 11, 2020

Today I decide to make a very ruff test.
I sent more than 10000 requests to the QR loader within 1 min.
API did not go down at all, but I note if I run the QR and check for the profile I will get this memory trace:

INFO[1176] Url /api/profile/me
Alloc = 1041 MiB        TotalAlloc = 4956 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1177] Url /api/profile/me
Alloc = 1042 MiB        TotalAlloc = 4957 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1178] Url /api/profile/me
Alloc = 1043 MiB        TotalAlloc = 4959 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1178] Url /api/profile/me
Alloc = 1044 MiB        TotalAlloc = 4960 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1179] Url /api/profile/me
Alloc = 1045 MiB        TotalAlloc = 4961 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1180] Url /api/profile/me
Alloc = 1047 MiB        TotalAlloc = 4962 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1181] Url /api/profile/me
Alloc = 1048 MiB        TotalAlloc = 4963 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1181] Url /api/profile/me
Alloc = 1049 MiB        TotalAlloc = 4964 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1182] Url /api/profile/me
Alloc = 1050 MiB        TotalAlloc = 4966 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1183] Url /api/profile/me
Alloc = 1051 MiB        TotalAlloc = 4967 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1184] Url /api/profile/me
Alloc = 1053 MiB        TotalAlloc = 4968 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1184] Url /api/profile/me
Alloc = 1054 MiB        TotalAlloc = 4969 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1185] Url /api/profile/me
Alloc = 1055 MiB        TotalAlloc = 4970 MiB   Sys = 1860 MiB  NumGC = 41
INFO[1186] Url /api/profile/me
Alloc = 1056 MiB        TotalAlloc = 4972 MiB   Sys = 1860 MiB  NumGC = 41

Note that each request will cost 1 MB! is that reasonable?
Overall, my server has 128 GB of ram.

Have look to the top

`
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

5769 root 20 0 2072048 1.8g 7924 R 21.5 1.4 12:41.30 wa-api
`
it is using 21% of the CPU - I have 24 core.
But ram 1.4
I am not sure if this is an accurate way to test

But I believe we have to utilize GC somehow in the base lib.
PS: While testing I note this message jump a lot
ERRO[1752] ErrConnectionFailed connection to WhatsApp servers failed: websocket: close 1006 (abnormal closure): unexpected EOF Any idea what is going on here?

@x00b
Copy link
Contributor Author

x00b commented May 12, 2020

Are you disconnecting/deleting the Wac after Qrcode timeout? About the ErrConnectionFailed... Try implementing the HandlerError logic, if you search the issues here you see that this Err is handled pretty well with handler error

About the base lib implementing a GC call by default doesn't matter until we find exactly where those bugs are.

I am having whatsapp conn issues too. But don't think might be related to this PR but with latest whatsapp web updates(which by the way is going to add new features).

@beshoo
Copy link
Collaborator

beshoo commented May 12, 2020

@x00b Great I found how to disconnect if the Wac.Info is nill which means it is not logged in, then i do disconnect for the Wac.

I test the benchmark "More than 10k Qr scan"
Alloc = 18 MiB TotalAlloc = 594 MiB Sys = 85 MiB NumGC = 43

How I did that:
In the NewConnection I add this :
time.AfterFunc(15*time.Second , func() { checkWac(wac) })

along with this simple func

func checkWac(wac *wa.Conn){

                if wac.Info == nil {
                        wac.Disconnect()
                }
}

and I note that when I tried to restore the sessions the wac kept connected "it is not logged in but keeps connected", every 15 seconds I do a newConnection to check if the mobile is online or not!
Now what I add, ef session is offline I will disconnect the wac as well.
Is this a good idea?

THANKS!

@beshoo
Copy link
Collaborator

beshoo commented May 28, 2020

@x00b
Have a look with me, after 8 days with one session I have this status

Alloc = 19 MiB TotalAlloc = 203 MiB Sys = 36 MiB NumGC = 5316
Note the NumGC , what do you think is it normal?

@x00b
Copy link
Contributor Author

x00b commented May 28, 2020

It might be something in your code i think. Probably an infinite for {} loop that is running in a goroutine or something like that that is causing a high memory usage and the garbage collector cannot work properly(clean the not-used memory).

If you have an infinite for loop, it might be the cause... Never do infinite for{} loop in Go

@SchulteMK SchulteMK merged commit 81a5d3e into Rhymen:master May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants