Open
Conversation
Adds `--spindle-session` argument to salloc and sbatch which runs jobs in that allocation with a spindle session.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds support for sessions to the SPANK plugin.
To ensure that sessions end when the allocation does, sessions are made part of the allocation. An allocation uses a session by passing
--spindle-sessionas an argument tosallocorsbatch. This differs from the other launchers which usespindle --start-session: the session is automatically started when the first step runs, and is automatically ended when the allocation ends.The major complication is that we have to start the session on every node of the job, even if the first step doesn't run on every node. When using rshlaunch, the first step uses rshlaunch to start the session on every node of the job. When not using rshlaunch, we have to use a dummy
srunon every node to cause the job prolog to run everywhere. The srun runs/bin/trueand so exits immediately, not consuming any resources. However, this has the side effect of using up a step ID. This could interfere with job scripts that assume the step IDs instead of checking. For this reason, the rshlaunch configuration may be preferred.To signal session end, a Unix socket is used. This is very similar to, and adapted from, the code used for the "exit note" with
OPT_BEEXIT.Almost all of the changes are restricted to the SPANK plugin itself. The only changes outside the SPANK plugin are to the testsuite: because the sessions are allocation-scoped, session and non-session tests must run in two different allocations. When the resource manager is
slurm_plugin, the test scripts check whether a session is being used in the current allocation and run either the session or non-session tests accordingly. The CI workflow for the plugin tests runs./runTeststwice, once outside a session and once inside.