Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add scaler for temporal #6191

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

add scaler for temporal #6191

wants to merge 6 commits into from

Conversation

Prajithp
Copy link

Implement a temporal scaler

Checklist

Fixes #4724

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@Prajithp Prajithp requested a review from a team as a code owner September 26, 2024 08:17
@febinct febinct mentioned this pull request Sep 26, 2024
7 tasks
@cretz
Copy link

cretz commented Sep 26, 2024

See comment at temporalio/temporal#33 (comment). Temporal's KEDA approach may slightly differ. Will have the engineers review, but we may suggest slight differences.

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@febinct
Copy link

febinct commented Sep 30, 2024

any suggestion/comments @cretz

@robholland
Copy link

We're currently discussing which use cases we would like the scaler to support, we'll be in a position to give some feedback/direction on Friday 4th.

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@jhecking
Copy link

jhecking commented Oct 4, 2024

We rolled out this new scaler to one of our dev clusters. (Rebased on top of the v2.15 release branch.) Activation/deactivation is working as expected. BUT, what I'm seeing is that the kena-operator pod is eating up all its allocated CPU when the temporal trigger is active. When I pause the scaledObject with the temporal trigger, then the CPU utilisation goes back to near zero. There are several other scaledObjects with prometheus triggers which don't cause this problem.

I don't see anything relevant in the keda-operator logs, even on DEBUG log level. I enabled profiling and this is the flame graph I see when this is happening:

Screenshot 2024-10-04 at 5 56 47 PM

[go tool pprof -http=:8081 "http://localhost:8082/debug/pprof/profile?seconds=60"]

For reference, here is a "normal" flame graph when all the Temporal triggers are paused:

Screenshot 2024-10-04 at 5 57 31 PM

One detail that might be relevant is that keda is connecting to the Temporal server via our Consul service mesh, i.e. there is a consul proxy injected into the keda-operator pod and the Temporal scaler is configured to connect to localhost:7233. But Keda is able to connect to the Temporal server, i.e. there are no connection errors. And we use this same configuration for all the Temporal worker services in the same cluster that Keda is supposed to scale, and none of them show this same behaviour.

I'm a bit at a loss as to how to debug this further. Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Temporal.io Scaler
5 participants