dpdisp-submit
This Skill guides the Agent to use the DPDispatcher tool to convert Shell commands into computational jobs and submit them to local machines or High-Performance Computing (HPC) clusters (supporting environments such as Shell, Slurm, PBS, LSF, Bohrium, etc.).
Syntax & Protocol
This section defines the field mappings, variable syntax, and special flags for the configuration file.
Protocol Acquisition (Initialize)
As an Agent, before gathering information and building the configuration, you MUST FIRST execute the following command to read and learn the latest Schema protocol specifications and requirements:
uvx --with dpdispatcher dargs doc dpdispatcher.entrypoints.submit.submission_args
Field Mapping
You must accurately translate the gathered user requirements into the following core JSON hierarchy:
machine: Defines the execution location and method (connection protocol, scheduler type).resources: Defines the computational resource requirements (nodes, CPUs, GPUs, queues, etc.).task_list: Defines the specific Shell commands to be executed and the file flow paths.
Environment Variable Syntax & Injection Rules
If user-specified values (e.g., username, Token, remote path) need to be read from local environment variables, you must strictly use the ${ENV_VAR_NAME} format in the template.
- Example:
"remote_root": "${USER_HPC_WORKSPACE}"
Reference & Reuse
The protocol allows the use of the {"$ref": "other.json"} syntax to nest and reference reusable configuration snippets from other JSON files (the referenced file is loaded first, and then the current file’s fields override or extend it). The relative path for $ref is resolved relative to the execution directory where submission.json is located. You must ensure that the execution path strictly matches the path pointed to by $ref.
Path Resolution Rules
- Base Directory (
work_base): Defines the base working directory level for all tasks, typically set to.(i.e., the current execution directory). - Task and File Path Resolution:
task_work_pathis resolved relative towork_base, whereas the file paths specified inforward_filesare strictly resolved relative totask_work_path.
Dry-Run Testing (—dry-run)
Parses the configuration, generates local directories, and validates the Schema, but DOES NOT actually submit the job to the machine or cluster. You can use this flag for a final safety check before real execution.
Execution Workflow
As an Agent, you MUST strictly execute tasks in the sequence of the following stages, without skipping any steps:
Information Gathering
When feeling vague or uncertain about the specific parameters and configuration information for running the job, you MUST proactively ask the user in natural language to supplement the necessary information.
Secure Build
You MUST generate the configuration file based on the acquired Schema protocol and the gathered information.
- Pure Static Configuration: If no environment variable injection is needed, directly generate the final
submission.json. - Environment Variable Injection Required:
- You must generate a
submission.template.jsonfile, using the${VAR_NAME}syntax ONLY for the variables that need to be replaced. - You must use the
envsubstcommand and explicitly list the variables to be replaced to prevent unrelated$…symbols in the JSON (such as"$ref") from being accidentally expanded. - Example:
envsubst '${USER_HPC_WORKSPACE} ${USER_OTHER_VAR}' < submission.template.json > submission.json
- You must generate a
Validate & Submit
You MUST strictly execute the following command chain in sequence.
Note: If the $ref syntax is used in the configuration, you must pass the --allow-ref flag to all validation and submission commands, otherwise parsing or validation will fail even if the JSON content is correct.
# Logic and Schema Validation
uvx --with dpdispatcher dargs check -f dpdispatcher.entrypoints.submit.submission_args submission.json
# Submit Job
uvx --from dpdispatcher dpdisp submit submission.json
Reporting Standard
After execution finishes, you MUST output a structured report to the user with the following fixed elements:
- Task Summary: Briefly describe the user’s request (execution location, executed command, allocated resources).
- Current State: Explicitly point out the status of the job (started / running / finished / failed).
- Artifact Path: Explicitly point out the location of the output files (for example, when
task_work_pathis., point out the specific paths ofloganderr). - Exception Guidance: If the job encounters an interruption or partial failure, provide the user with detailed issue information and execute according to the user’s further instructions.
Long-Running Jobs
High-performance computing tasks usually take an extremely long time (from hours to weeks), and there is a long time gap between the submission command and the final result. This is not a one-off, instant Q&A process, and you must choose the appropriate disconnect-prevention execution mode based on the specific scenario:
Blocking Mode
- Wrap in
tmux: Run the standarddpdisp submit submission.json. The program will continuously hang and wait until the job is truly finished calculating on the cluster and the files are downloaded back before exiting. You must run it inside atmuxsession to prevent any possible disconnection from killing the process.
Non-blocking Mode
- Use the
--exit-on-submitflag: After successfully handing over the job to the scheduling system (e.g., Slurm), the program will immediately exit the terminal and return<exit_code>. It will not wait for execution to complete or download outputs. - State Definition: In this mode, you must strictly distinguish between the following two states for the user:
- Successfully Submitted (Submitted): Just finished executing the command with the flag and returned
0. At this time, the job is only accepted by the backend, may be queuing, and output files are temporarily unavailable. - Fully Completed (Completed): After re-running the synchronization command later, the backend task finishes successfully, AND all required output files have been successfully retrieved to the local machine.
- Successfully Submitted (Submitted): Just finished executing the command with the flag and returned
- Idempotent Recovery Principle (Resuming Jobs): DPDispatcher has built-in state tracking and idempotency. It will automatically resume unfinished tasks and will not repeatedly execute completed ones.
- Trigger Conditions: Used for state synchronization and file downloading in non-blocking mode; or when the job fails, times out, is unexpectedly interrupted, the user explicitly requests to “resume” or “retry”, or your own SSH/network disconnects during monitoring.
- Recovery Action: You do not need to modify
submission.jsonor attempt to clean up the remote directory. You simply need to re-execute the exact same submission command (e.g.,uvx --from dpdispatcher dpdisp submit submission.json --allow-ref) as is in the same directory.
Timeline Example (Non-blocking Mode):
- [Day 1, 10:00] Submit job:
dpdisp submit --exit-on-submit submis_task.json- [Day 1, 10:01] The command exits immediately and returns 0. At this time, it is only in the Successfully Submitted state. The Agent can exit the terminal to execute other tasks.
- [Waiting Period] (A long queuing and calculation phase, potentially lasting for days)
- [Day 3, 15:00] The Agent returns to the directory to check: triggers the idempotent recovery mechanism, re-runs
dpdisp submit submis_task.jsonwithout the flag as is to synchronize the state and trigger file downloading. Only after the download is complete is it marked as Fully Completed.
Strict Guardrails
Before performing any operation, as an Agent, you MUST UNCONDITIONALLY obey the following security baselines:
- Direct SSH Connections are Strictly Prohibited: You are absolutely not allowed to attempt connecting directly to the remote HPC using
ssh, write custom Paramiko/Fabric Python scripts, or manually execute remote commands. All remote connections, file transfers, and job management MUST AND ONLY be safely handled by DPDispatcher by generatingsubmission.jsonand calling thedpdisp submittool. - Reading External Reference JSON Files is Strictly Prohibited: If the user provides a JSON file to supply certain information, you are ABSOLUTELY PROHIBITED from reading or printing the contents of that file. The file contains raw sensitive data, and reading it will cause confidential information to leak into the current conversation context.
- Reading Configuration Files with Sensitive Data is Strictly Prohibited: After injecting environment variables via
envsubstto generate the finalsubmission.json, you are ABSOLUTELY PROHIBITED from reading or printing the contents of the file. The file contains raw sensitive data, and reading it will cause confidential information to leak into the current conversation context.
Example
User request: “Please run the simulation located in the task02 directory on my Slurm cluster. Load my username from $HPC_USER and the workspace path from $HPC_WORKDIR. We already have a resource_defaults.json in the parent workspace directory, please reference it and just add the debug queue.”
The Agent discovers that the current directory structure is as follows:
<WORKSPACE>/
├── resource_defaults.json
├── ...
└── run_dir/
├── ...
└── task02/
├── run_simulation.sh
├── ...
└── data/
├── input.dat
└── ...
The Agent decides to create the configuration file submis_task02.template.json within the run_dir/ directory (at the same level as the task02/ folder).
The Agent has remembered the $ref pointing to the parent directory ../, the task_work_path explicitly targeting "task02", and forward_files remaining strictly relative to that task_work_path.
Then it writes down:
{
"work_base": ".",
"machine": {
"batch_type": "Slurm",
"context_type": "SSHContext",
"remote_profile": {
"hostname": "<target-host>",
"username": "${HPC_USER}",
"port": 22
},
"remote_root": "${HPC_WORKDIR}/dpdisp_run"
},
"resources": {
"$ref": "../resource_defaults.json",
"queue_name": "debug",
"group_size": 1
},
"task_list": [
{
"command": "bash run_simulation.sh",
"task_work_path": "task02",
"forward_files": [
"run_simulation.sh",
"data/input.dat"
],
"backward_files": [
"result.out",
"log",
"err"
]
}
]
}
Then the Agent run the validation and submission commands from within the <WORKSPACE> directory:
cd run_dir/
envsubst '${HPC_USER} ${HPC_WORKDIR}' < submis_task02.template.json > submis_task02.json
uvx --with dpdispatcher dargs check --allow-ref -f dpdispatcher.entrypoints.submit.submission_args submis_task02.json
tmux new-session -d -s dpdisp_task02 "uvx --from dpdispatcher dpdisp submit --allow-ref submis_task02.json"
tmux ls