Skip to content
This repository has been archived by the owner on Nov 4, 2021. It is now read-only.

Replace io-REDIS #178

Closed
mariusandra opened this issue Feb 19, 2021 · 4 comments
Closed

Replace io-REDIS #178

mariusandra opened this issue Feb 19, 2021 · 4 comments

Comments

@mariusandra
Copy link
Collaborator

mariusandra commented Feb 19, 2021

Last night I spent way too long trying to track down an impossible bug in the plugin server, which turns out to be caused by a bug in ioredis.

Basically, the following line would get stuck forever one time out of a thousand:

const hooksCacheKey = `@posthog/plugin-server/hooks/${team.id}`	
const hooksCacheValue = await this.pluginsServer.redis.get(hooksCacheKey)	

Removing any mention of redis from the ingestion pipeline solved the issue.

However, various plugins still use redis and we can't just have them stalling randomly like they now still sometimes do:

image

Luckily plugins now have timeouts and this doesn't do more than stall the ingestion for 30sec... and leave some open handles hanging.

Still crashing for no reason is not acceptable, even it it's one time in 100k.

What to do? Fix the bug in ioredis? Replace ioredis with redis?

I think celery.node required ioredis for some reason, but since we have integrated that code into our codebase, perhaps we can work around it... or even use both libraries interchangeably (one for celery, one for other uses)... And perhaps switching the redis library won't help at all? But I'm not sure what to do then.

@macobo
Copy link
Contributor

macobo commented Feb 19, 2021

Flyby thought: Perhaps there are some options in the redis library which cause/can help us avoid these issues? https://github.com/luin/ioredis/blob/0b001e0f3313ad53e9f62a8ab3a70c478205698b/lib/redis/index.ts#L51-L90

It does sound like some sort of a 'infinite retry' bug, e.g. with commands or connections?

Perhaps even maxRetriesPerRequest which is currently set to -1?

const redis = new Redis(serverConfig.REDIS_URL, { maxRetriesPerRequest: -1 })

@mariusandra
Copy link
Collaborator Author

Appreciate the outside perspective, but I'm not sure if that would help. The PR to try to fix this writes "maxRetriesPerRequest does not help with this" :).

It seems to be getting stuck behind some socket connection issue and does not even try to retry.

I think the first step in resolving this would be to abstract all the redis calls into db.ts and wrap them all with timeouts to be sure that all the crashes are due to this.... and the next step would be to swap it out for another redis library to see if that makes and difference...

@mariusandra
Copy link
Collaborator Author

Still pending final analysis, but it seems like introducing connection pooling was the real answer.

ioredis is by far the best nodejs redis library and it would have been a shame to swap it out for something older and unmaintained... or new and untested.

@mariusandra
Copy link
Collaborator Author

Connection pooling seems to have worked. The rate of errors dropped drastically and the ones still remaining seem to be related to shutdowns: #210

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants