Computational Chemistry Agent Skills

deepmd-train-dpa3

machine-learning-potentials
Train a DeePMD-kit model using the DPA3 descriptor with the PyTorch backend. Use when the user wants to train a state-of-the-art deep potential model based on message passing on Line Graph Series (LiGS). DPA3 provides high accuracy and strong generalization, suitable for large atomic models (LAM) and diverse chemical systems. Supports both fixed and dynamic neighbor selection.
v1.0 Requires deepmd-kit with PyTorch backend installed. GPU strongly recommended. Custom OP library required for LAMMPS deployment. repository source

Installation

Install folder: deepmd-train-dpa3 · Repo path: machine-learning-potentials/deepmd-train-dpa3
Copy/paste this message to your OpenClaw agent.
Please install the OpenClaw skill "deepmd-train-dpa3" on the OpenClaw host.

Steps:
- Download: https://skills.computchem.cn/skill-zips/deepmd-train-dpa3.zip
- Unzip it to get deepmd-train-dpa3/
- Copy deepmd-train-dpa3/ into the workspace skills directory (<workspace>/skills/)
- Start a NEW OpenClaw session so the skill is loaded

Then verify:
openclaw skills list --eligible
openclaw skills info deepmd-train-dpa3
Prerequisites: Requires deepmd-kit with PyTorch backend installed. GPU strongly recommended. Custom OP library required for LAMMPS deployment.

DeePMD-kit Training: DPA3

Train a deep potential model using the DPA3 descriptor, an advanced message-passing architecture operating on Line Graph Series (LiGS). DPA3 is designed as a large atomic model (LAM) with high fitting accuracy and robust generalization across diverse chemical and materials systems.

Quick Start

dp --pt train input.json

Agent Responsibilities

  1. Confirm the user has a working deepmd-kit environment with PyTorch backend.
  2. Collect the minimum required information:
    • Training data paths (deepmd/npy or deepmd/hdf5 format)
    • Validation data paths
    • Element types (type_map)
    • Target number of training steps
    • Model size preference (L3/L6/L12 layers)
  3. Generate a complete input.json training configuration.
  4. Decide whether to use fixed or dynamic neighbor selection based on system diversity.
  5. Run training and monitor the learning curve.
  6. Freeze the trained model and optionally test it.

Workflow

Step 1: Prepare Training Data

Same format as other DeePMD models. Each system directory should contain:

system_dir/
├── type.raw
├── type_map.raw
└── set.000/
    ├── coord.npy
    ├── energy.npy
    ├── force.npy
    ├── box.npy
    └── virial.npy

DPA3 also supports the mixed type data format for multi-element systems.

Step 2: Write input.json

Standard DPA3 (fixed selection)

{
  "model": {
    "type_map": [
      "O",
      "H"
    ],
    "descriptor": {
      "type": "dpa3",
      "repflow": {
        "n_dim": 128,
        "e_dim": 64,
        "a_dim": 32,
        "nlayers": 6,
        "e_rcut": 6.0,
        "e_rcut_smth": 5.3,
        "e_sel": 120,
        "a_rcut": 4.0,
        "a_rcut_smth": 3.5,
        "a_sel": 30,
        "axis_neuron": 4,
        "fix_stat_std": 0.3,
        "a_compress_rate": 1,
        "a_compress_e_rate": 2,
        "a_compress_use_split": true,
        "update_angle": true,
        "smooth_edge_update": true,
        "edge_init_use_dist": true,
        "use_exp_switch": true,
        "update_style": "res_residual",
        "update_residual": 0.1,
        "update_residual_init": "const"
      },
      "activation_function": "silut:10.0",
      "use_tebd_bias": false,
      "precision": "float32",
      "concat_output_tebd": false,
      "seed": 1
    },
    "fitting_net": {
      "neuron": [
        240,
        240,
        240
      ],
      "resnet_dt": true,
      "precision": "float32",
      "activation_function": "silut:10.0",
      "seed": 1
    }
  },
  "learning_rate": {
    "type": "exp",
    "decay_steps": 5000,
    "start_lr": 0.001,
    "stop_lr": 3e-05
  },
  "loss": {
    "type": "ener",
    "start_pref_e": 0.2,
    "limit_pref_e": 20,
    "start_pref_f": 100,
    "limit_pref_f": 60,
    "start_pref_v": 0.02,
    "limit_pref_v": 1
  },
  "optimizer": {
    "type": "AdamW",
    "adam_beta1": 0.9,
    "adam_beta2": 0.999,
    "weight_decay": 0.001
  },
  "training": {
    "stat_file": "./dpa3.hdf5",
    "training_data": {
      "systems": [
        "./data/train_0",
        "./data/train_1",
        "./data/train_2"
      ],
      "batch_size": 1
    },
    "validation_data": {
      "systems": [
        "./data/valid_0"
      ],
      "batch_size": 1
    },
    "numb_steps": 1000000,
    "gradient_max_norm": 5.0,
    "seed": 10,
    "disp_file": "lcurve.out",
    "disp_freq": 100,
    "save_freq": 2000
  }
}

If you do not want to train on virial, set the virial prefactors to 0.

DPA3 uses different default loss prefactors compared to SE_E2_A. See the comparison table in the “Key Differences from SE_E2_A” section below.

The meaning of each parameter can be generated through dp doc-train-input. Considering the output RST documentation on the screen is very long, use grep to find the documentation of a specific parameter:

dp doc-train-input | grep -A 7 training/numb_steps
dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[dpa3\]/repflow/e_sel'

DPA3 with Dynamic Selection

For systems with highly variable neighbor counts (e.g., multi-element datasets), use dynamic selection by modifying the repflow section:

"repflow": {
  "e_sel": 1200,
  "a_sel": 300,
  "use_dynamic_sel": true,
  "sel_reduce_factor": 10.0
}

When use_dynamic_sel is true, the effective selection is e_sel / sel_reduce_factor and a_sel / sel_reduce_factor (i.e., 120 and 30 in this example), but the model dynamically adapts to varying neighbor counts.

Step 3: Run Training

dp --pt train input.json

To restart from a checkpoint:

dp --pt train input.json --restart model.ckpt.pt

Step 4: Monitor Training

The learning curve is written to lcurve.out with columns:

#  step  rmse_val  rmse_trn  rmse_e_val  rmse_e_trn  rmse_f_val  rmse_f_trn  rmse_v_val  rmse_v_trn  lr
  • rmse_e_*: energy RMSE per atom (eV/atom)
  • rmse_f_*: force RMSE (eV/A)
  • rmse_v_*: virial RMSE (eV/atom, only present if virial data is available)
  • lr: current learning rate

Step 5: Freeze the Model

dp --pt freeze -o model.pth

Step 6: Test the Model

dp --pt test -m model.pth -s /path/to/test_system -n 30

Model Size Guide

Choose the number of layers based on accuracy vs. cost trade-off:

Modelnlayersn_dime_dima_dimRelative CostUse Case
DPA3-L33256128321xQuick prototyping, smaller systems
DPA3-L3-small312864320.8xFast iteration, limited GPU memory
DPA3-L66256128322xRecommended for production
DPA3-L6-small612864321.4xGood accuracy/cost balance

Benchmark RMSE (averaged over 6 representative systems, 0.5M steps):

ModelEnergy (meV/atom)Force (meV/A)Virial (meV/atom)
DPA3-L3 (256/128/32)5.7485.443.1
DPA3-L3-small (128/64/32)6.9993.646.7
DPA3-L6 (256/128/32)4.8579.939.7
DPA3-L6-small (128/64/32)5.1177.741.2
DPA2-L6 (reference)12.12109.383.1

Key Differences from SE_E2_A

AspectSE_E2_ADPA3
ArchitectureTwo-body embeddingMessage passing on LiGS
Default precisionfloat64float32
OptimizerAdamAdamW (with weight_decay)
Loss prefactorse: 0.02→1, f: 1000→1e: 0.2→20, f: 100→60, v: 0.02→1
stop_lr3.51e-83e-5
Gradient clippingNot usedgradient_max_norm: 5.0
Virial trainingOptionalRecommended
Model compressionSupportedNot supported
Activationtanh (default)silut:10.0

Key Hyperparameters

Repflow (Descriptor)

ParameterDescriptionDefault
n_dimNode embedding dimension128 or 256
e_dimEdge embedding dimension64 or 128
a_dimAngle embedding dimension32
nlayersNumber of message passing layers3 or 6
e_rcutEdge cutoff radius (A)6.0
e_rcut_smthEdge smooth cutoff start5.3
e_selMax edge neighbors120
a_rcutAngle cutoff radius (A)4.0
a_rcut_smthAngle smooth cutoff start3.5
a_selMax angle neighbors30
update_styleResidual update style”res_residual”
update_residualResidual scaling factor0.1

Activation Function

DPA3 uses silut:10.0 by default. For datasets where training is unstable, consider switching to tanh:

"descriptor": {
  "type": "dpa3",
  "repflow": { ... },
  "activation_function": "tanh"
},
"fitting_net": {
  "activation_function": "tanh"
}

Optimizer

DPA3 uses AdamW by default (decoupled weight decay):

"optimizer": {
  "type": "AdamW",
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "weight_decay": 0.001
}

Gradient Clipping

Recommended for DPA3 to stabilize training:

"training": {
  "gradient_max_norm": 5.0
}

Agent Checklist

  • Training data exists and is in deepmd format
  • type_map matches the elements in the data
  • Precision is set to float32 (DPA3 default, not float64)
  • AdamW optimizer is configured with weight_decay
  • gradient_max_norm is set (recommended: 5.0)
  • stop_lr is 3e-5 (not 3.51e-8 as in SE_E2_A)
  • Virial loss prefactors are included if virial data is available
  • stat_file is set to cache statistics (avoids recomputation on restart)
  • Training completes without NaN in lcurve.out
  • Model is frozen to .pth after training

References