Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total running Jobs must not exceed maxScale - Running jobs #528

Merged
merged 3 commits into from
Jan 15, 2020

Conversation

balchua
Copy link
Contributor

@balchua balchua commented Dec 29, 2019

This PR is to honor the queueLength taking into account the number of running Jobs.
In short, total running Jobs will be queueLength - runningJobs.

Running jobs are those jobs which are not in Completed or Failed Status.
This should address issue #525

@zroubalik
Copy link
Member

LGTM, thanks @balchua !

@zroubalik zroubalik self-requested a review January 2, 2020 21:33
Copy link
Contributor

@Cottonglow Cottonglow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested this with a Kafka scaler and seems to work fine!

One thing I've noticed is that the scaler will create a job per message even if there are jobs running already. This isn't something this PR introduced but it is something this PR can help fix.

It might be worth adding to the code, to only create the necessary number of jobs as there are messages in the queue, otherwise you might end up with 5 jobs for 1 message.

@balchua
Copy link
Contributor Author

balchua commented Jan 3, 2020

@Cottonglow by setting queueLength to 1 prevents it from having more running jobs than the number of messages.

@balchua
Copy link
Contributor Author

balchua commented Jan 3, 2020

I realized that you are using kafka scaler, i wonder how it works for kafka, since the scaler is only checking for metrics with queueLength and kafka scaler doesn't use queueLength.

Copy link
Contributor

@stgricci stgricci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @balchua!

pkg/handler/scale_jobs.go Outdated Show resolved Hide resolved
@Cottonglow
Copy link
Contributor

@balchua
Kafka uses lagThreshold which from my understanding is the threshold for the number of messages on the queue that hasn't been consumed yet before kicking off the scaling.
Looking at the code, it looks like handleScaleJob treats lagThreshold the same as queueLength. (Which might need to be considered in the future for Kafka)

I've experimented by putting the value of lagThreshold to 1 and it seems to only create 1 job regardless of the number of messages on my queue. I'm not sure what behaviour you're seeing with your scaler?

If the idea is to have 1 job per message (which is what I understand the plan is from the docs), then this wouldn't work. Unless this is a problem with the Kafka scaler only.

This can be a consideration for a future PR though if it is too much work as this does fix the problem of the number of jobs exceeding the max value which is very useful now.

@balchua
Copy link
Contributor Author

balchua commented Jan 6, 2020

@Cottonglow i was using rabbitmq scaler.

The code in

if m.MetricName == "queueLength" {
metricValue, _ = m.Value.AsInt64()
queueLength += metricValue
}

Only considers metricName of queuelength. So i was just wondering how it worked for kafka scaler.

@balchua
Copy link
Contributor Author

balchua commented Jan 6, 2020

Forget about my previous comment 😁. I got it figured out. handleScaleJob sort of "forces" the metric name to be queueLength as @Cottonglow mentioned.

@balchua
Copy link
Contributor Author

balchua commented Jan 6, 2020

Well it is still going to start x number of jobs per available message in the "queue" limited by the trigger's metric queueLength or lagThreshold. This helps to avoid spawning new jobs when previous jobs are long running and before the messages are dequeued.

@jeffhollan
Copy link
Member

@ahmelsayed is this one ready to merge?

@balchua
Copy link
Contributor Author

balchua commented Jan 9, 2020

@jeffhollan, i have yet to work on the review given above. 😁

@balchua
Copy link
Contributor Author

balchua commented Jan 10, 2020

@stgricci i have updated the code to make sure maxScale is not negative. thanks for the review.

@balchua
Copy link
Contributor Author

balchua commented Jan 15, 2020

@ahmelsayed i think this is good to go. @stgricci gave his thumbs up. 😁

@ahmelsayed ahmelsayed merged commit 87a777a into kedacore:master Jan 15, 2020
preflightsiren pushed a commit to preflightsiren/keda that referenced this pull request Nov 7, 2021
Signed-off-by: Zbynek Roubalik <zroubali@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants