Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapper: Monitor status of child task process even if parent exits #3307

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

tristanolive
Copy link
Contributor

Description of the Change
On Windows, CreateProcess() is used to launch tasks, but this on its own does not handle child processes; if the parent task process exits, the workunit will be terminated. If <wait_for_children> is set in the job file, attach the task process to a job object instead, which can then be monitored to determine when all child processes are finished.

Alternate Designs

Release Notes

Add <wait_for_children> option for tasks in job.xml

On Windows, CreateProcess() is used to launch tasks, but this on its own does not handle child processes; if the parent task process exits, the workunit will be terminated. If <wait_for_children> is set in the job file, attach the task process to a job object instead, which can then be monitored to determine when all child processes are finished.
@AenBleidd
Copy link
Member

Closing and reopening to rerun CI builds

@AenBleidd AenBleidd closed this Sep 26, 2019
@AenBleidd AenBleidd reopened this Sep 26, 2019
When using a job object to handle child processes, the status should still return 0 on success. If a child exits abnormally, the completion code is set to JOB_OBJECT_MSG_ABNORMAL_EXIT_PROCESS, so just return that for now.
Add routines for handling kill(), stop(), and resume() calls on tasks that use the <wait_for_children> option
The job_handle for job objects is only relevant on Windows, so should not be referenced outside of _WIN32 blocks
@tristanolive
Copy link
Contributor Author

Closing and reopening again to get CI builds to run.

@davidpanderson
Copy link
Contributor

Why is this needed? Normally programs that create children wait for them to finish.

@Rytiss
Copy link
Contributor

Rytiss commented Oct 24, 2019

Normally they do, but we are trying to wrap an application that does not follow the normal pattern. It spawns child processed and exits, which in turn makes the wrapper and thus BOINC think the app has finished computations.

The get_job_object_processes() function was not providing a complete list of PIDs, as the cbJobObjectInformationLength parameter passed to QueryInformationJobObject() needed to be larger. It should now accomodate up to 32 processes in the job object.

Also related to job control, having no timeout set in the GetQueuedCompletionStatus() call was causing task polling to hang indefinitely when a child process launched another child process. Set the timeout to 3000ms to prevent this.
@ChristianBeer
Copy link
Member

@tristanolive @Rytiss This PR has a conflict with current master. Could you please check if the changes are still needed and adjust the code accordingly?

@Rytiss
Copy link
Contributor

Rytiss commented Jan 24, 2021

I believe the changes are still needed. @tristanolive - can you mod the code so that it does not conflict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants