Adjust Queue Logging #202

patricktnast · 2024-01-04T22:09:10Z

Adjust Queue Logging

Description

Category:
bugfix / documentation
JIRA issue: MIC-MIC-4650

Changes and notes

From the ticket:

Overview: The psimulate logger previously has buckets for pending, running, and workers that come from when we submitted all jobs to slurm at once. Now, the meaning is somewhat different, or at least ambiguous, because "pending" means all queued jobs, not necessarily the set of jobs (or rather, in this case, workers) which have been submitted to slurm but are waiting to receive resources.

As I imply above, thinking about the "jobs" submitted to the cluster is a sort of category error--it's the workers that are really getting submitted. Ultimately, i thought the most sensible thing would be to leave "pending" as-is, under the understanding that this now means just "idle jobs awating workers". Instead, at the "Queue All" level, I added a listing for "inactive workers", that is, the number of "missing workers" from the number that were intially submitted, that are either a) waiting for cluster resources or b) have completed all jobs in the queue and quit themselves.

I don't think there's a lazy way to disambiguate (a) from (b), you'd need to pull information from outside of the registry--and I figured the two situations can be resolved by context (one happens mostly at the beginning of the sim, and one happens mostly at the end).

Also, "running" and "workers" should effectively be the same, unless rq.Worker.all also counts workers in the queue that are not performing a job (but in that case they should soon quit?)

I too out "workers", assuming the same information is given by "running"

Also consider renaming "Finished" to "Successful" or "Completed"

I renamed this job status (but not the underlying FinishedQueue) to Successful for the purposes of logging.

I also changed some log info statements to debug which IMO deserve it

Testing

Tested against nutrition optimization, but I am not actually getting allocated any workers atm

rmudambi

Can you include a screenshot of what the logging output looks like when you are able to get scheduled? I'll approve after I see it look as expected.

N.B. - you can use psimulate test sleep rather than running an actual simulation to reduce our cluster impact.

stevebachmeier · 2024-01-05T15:14:42Z

Like Rajan, I'd also like to actually see a screenshot of the logging in action.

stevebachmeier · 2024-01-05T15:15:43Z

src/vivarium_cluster_tools/psimulate/redis_dbs/registry.py

        finished_jobs = self._get_finished_jobs()
        start = time.time()
        results = []
        for job_id in finished_jobs:
            result = self._get_result(job_id)
            if result is not None:
                results.append(result)
-        self._logger.info(
+        self._logger.debug(


I like this change

stevebachmeier · 2024-01-05T15:17:03Z

src/vivarium_cluster_tools/psimulate/redis_dbs/registry.py

                )
                self._status["workers"] = q_workers
-                self._status["done"] = 100 * self._status["finished"] / self._status["total"]
+                self._status["done"] = (


so "done" for this context is basically finished + successful?

Yeah, I don't think that the "done" percentage is including failed jobs

patricktnast · 2024-01-05T17:34:22Z

@rmudambi @stevebachmeier here is a screenshot of the logs (with and without individual queues)

rmudambi

Looks good. To be clear, we expect these two constraints to hold?

max_workers == inactive_workers + running
total_jobs == pending + running + failed + successful

patricktnast added 7 commits January 4, 2024 11:16

add gitignore for vscode

06e80ca

adjust logging

3a3bc68

lint

5335465

rename "Finished" to "Successful"

2a8107a

remove workers

2ab1d93

lint

5831059

fix test

a8a8f94

patricktnast marked this pull request as ready for review January 4, 2024 22:29

patricktnast requested review from albrja, collijk, hussain-jafari, mattkappel, rmudambi and stevebachmeier as code owners January 4, 2024 22:29

rmudambi reviewed Jan 5, 2024

View reviewed changes

stevebachmeier reviewed Jan 5, 2024

View reviewed changes

Merge branch 'main' into bugfix/pnast/MIC-4650-queue-buckets

7ec55cc

stevebachmeier self-requested a review January 5, 2024 18:51

stevebachmeier approved these changes Jan 5, 2024

View reviewed changes

rmudambi approved these changes Jan 5, 2024

View reviewed changes

Merge branch 'main' into bugfix/pnast/MIC-4650-queue-buckets

93a4892

patricktnast merged commit d228061 into main Jan 8, 2024
6 checks passed

patricktnast deleted the bugfix/pnast/MIC-4650-queue-buckets branch January 8, 2024 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust Queue Logging #202

Adjust Queue Logging #202

patricktnast commented Jan 4, 2024 •

edited

Loading

rmudambi left a comment

stevebachmeier commented Jan 5, 2024

stevebachmeier Jan 5, 2024

stevebachmeier Jan 5, 2024

patricktnast Jan 5, 2024

patricktnast commented Jan 5, 2024

rmudambi left a comment

Adjust Queue Logging #202

Adjust Queue Logging #202

Conversation

patricktnast commented Jan 4, 2024 • edited Loading