-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust Queue Logging #202
Adjust Queue Logging #202
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include a screenshot of what the logging output looks like when you are able to get scheduled? I'll approve after I see it look as expected.
N.B. - you can use psimulate test sleep
rather than running an actual simulation to reduce our cluster impact.
Like Rajan, I'd also like to actually see a screenshot of the logging in action. |
finished_jobs = self._get_finished_jobs() | ||
start = time.time() | ||
results = [] | ||
for job_id in finished_jobs: | ||
result = self._get_result(job_id) | ||
if result is not None: | ||
results.append(result) | ||
self._logger.info( | ||
self._logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change
) | ||
self._status["workers"] = q_workers | ||
self._status["done"] = 100 * self._status["finished"] / self._status["total"] | ||
self._status["done"] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so "done" for this context is basically finished + successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think that the "done" percentage is including failed jobs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. To be clear, we expect these two constraints to hold?
max_workers == inactive_workers + running
total_jobs == pending + running + failed + successful
Adjust Queue Logging
Description
bugfix / documentation
Changes and notes
From the ticket:
As I imply above, thinking about the "jobs" submitted to the cluster is a sort of category error--it's the workers that are really getting submitted. Ultimately, i thought the most sensible thing would be to leave "pending" as-is, under the understanding that this now means just "idle jobs awating workers". Instead, at the "Queue All" level, I added a listing for "inactive workers", that is, the number of "missing workers" from the number that were intially submitted, that are either a) waiting for cluster resources or b) have completed all jobs in the queue and quit themselves.
I don't think there's a lazy way to disambiguate (a) from (b), you'd need to pull information from outside of the registry--and I figured the two situations can be resolved by context (one happens mostly at the beginning of the sim, and one happens mostly at the end).
I too out "workers", assuming the same information is given by "running"
I renamed this job status (but not the underlying FinishedQueue) to Successful for the purposes of logging.
I also changed some log info statements to debug which IMO deserve it
Testing
Tested against nutrition optimization, but I am not actually getting allocated any workers atm