DeePMD-kit Training: DPA3

Train a deep potential model using the DPA3 descriptor, an advanced message-passing architecture operating on Line Graph Series (LiGS). DPA3 is designed as a large atomic model (LAM) with high fitting accuracy and robust generalization across diverse chemical and materials systems.

Quick Start

dp --pt train input.json

Agent Responsibilities

Confirm the user has a working deepmd-kit environment with PyTorch backend.
Collect the minimum required information:
- Training data paths (deepmd/npy or deepmd/hdf5 format)
- Validation data paths
- Element types (type_map)
- Target number of training steps
- Model size preference (L3/L6/L12 layers)
Generate a complete input.json training configuration.
Decide whether to use fixed or dynamic neighbor selection based on system diversity.
Run training and monitor the learning curve.
Freeze the trained model and optionally test it.

Workflow

Step 1: Prepare Training Data

Same format as other DeePMD models. Each system directory should contain:

system_dir/
├── type.raw
├── type_map.raw
└── set.000/
    ├── coord.npy
    ├── energy.npy
    ├── force.npy
    ├── box.npy
    └── virial.npy

DPA3 also supports the mixed type data format for multi-element systems.

Step 2: Write input.json

Standard DPA3 (fixed selection)

{
  "model": {
    "type_map": [
      "O",
      "H"
    ],
    "descriptor": {
      "type": "dpa3",
      "repflow": {
        "n_dim": 128,
        "e_dim": 64,
        "a_dim": 32,
        "nlayers": 6,
        "e_rcut": 6.0,
        "e_rcut_smth": 5.3,
        "e_sel": 120,
        "a_rcut": 4.0,
        "a_rcut_smth": 3.5,
        "a_sel": 30,
        "axis_neuron": 4,
        "fix_stat_std": 0.3,
        "a_compress_rate": 1,
        "a_compress_e_rate": 2,
        "a_compress_use_split": true,
        "update_angle": true,
        "smooth_edge_update": true,
        "edge_init_use_dist": true,
        "use_exp_switch": true,
        "update_style": "res_residual",
        "update_residual": 0.1,
        "update_residual_init": "const"
      },
      "activation_function": "silut:10.0",
      "use_tebd_bias": false,
      "precision": "float32",
      "concat_output_tebd": false,
      "seed": 1
    },
    "fitting_net": {
      "neuron": [
        240,
        240,
        240
      ],
      "resnet_dt": true,
      "precision": "float32",
      "activation_function": "silut:10.0",
      "seed": 1
    }
  },
  "learning_rate": {
    "type": "exp",
    "decay_steps": 5000,
    "start_lr": 0.001,
    "stop_lr": 3e-05
  },
  "loss": {
    "type": "ener",
    "start_pref_e": 0.2,
    "limit_pref_e": 20,
    "start_pref_f": 100,
    "limit_pref_f": 60,
    "start_pref_v": 0.02,
    "limit_pref_v": 1
  },
  "optimizer": {
    "type": "AdamW",
    "adam_beta1": 0.9,
    "adam_beta2": 0.999,
    "weight_decay": 0.001
  },
  "training": {
    "stat_file": "./dpa3.hdf5",
    "training_data": {
      "systems": [
        "./data/train_0",
        "./data/train_1",
        "./data/train_2"
      ],
      "batch_size": 1
    },
    "validation_data": {
      "systems": [
        "./data/valid_0"
      ],
      "batch_size": 1
    },
    "numb_steps": 1000000,
    "gradient_max_norm": 5.0,
    "seed": 10,
    "disp_file": "lcurve.out",
    "disp_freq": 100,
    "save_freq": 2000
  }
}

If you do not want to train on virial, set the virial prefactors to 0.

DPA3 uses different default loss prefactors compared to SE_E2_A. See the comparison table in the “Key Differences from SE_E2_A” section below.

The meaning of each parameter can be generated through dp doc-train-input. Considering the output RST documentation on the screen is very long, use grep to find the documentation of a specific parameter:

dp doc-train-input | grep -A 7 training/numb_steps
dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[dpa3\]/repflow/e_sel'

DPA3 with Dynamic Selection

For systems with highly variable neighbor counts (e.g., multi-element datasets), use dynamic selection by modifying the repflow section:

"repflow": {
  "e_sel": 1200,
  "a_sel": 300,
  "use_dynamic_sel": true,
  "sel_reduce_factor": 10.0
}

When use_dynamic_sel is true, the effective selection is e_sel / sel_reduce_factor and a_sel / sel_reduce_factor (i.e., 120 and 30 in this example), but the model dynamically adapts to varying neighbor counts.

Step 3: Run Training

dp --pt train input.json

To restart from a checkpoint:

dp --pt train input.json --restart model.ckpt.pt

Step 4: Monitor Training

The learning curve is written to lcurve.out with columns:

#  step  rmse_val  rmse_trn  rmse_e_val  rmse_e_trn  rmse_f_val  rmse_f_trn  rmse_v_val  rmse_v_trn  lr

rmse_e_*: energy RMSE per atom (eV/atom)
rmse_f_*: force RMSE (eV/A)
rmse_v_*: virial RMSE (eV/atom, only present if virial data is available)
lr: current learning rate

Step 5: Freeze the Model

dp --pt freeze -o model.pth

Step 6: Test the Model

dp --pt test -m model.pth -s /path/to/test_system -n 30

Model Size Guide

Choose the number of layers based on accuracy vs. cost trade-off:

Model	nlayers	n_dim	e_dim	a_dim	Relative Cost	Use Case
DPA3-L3	3	256	128	32	1x	Quick prototyping, smaller systems
DPA3-L3-small	3	128	64	32	0.8x	Fast iteration, limited GPU memory
DPA3-L6	6	256	128	32	2x	Recommended for production
DPA3-L6-small	6	128	64	32	1.4x	Good accuracy/cost balance

Benchmark RMSE (averaged over 6 representative systems, 0.5M steps):

Model	Energy (meV/atom)	Force (meV/A)	Virial (meV/atom)
DPA3-L3 (256/128/32)	5.74	85.4	43.1
DPA3-L3-small (128/64/32)	6.99	93.6	46.7
DPA3-L6 (256/128/32)	4.85	79.9	39.7
DPA3-L6-small (128/64/32)	5.11	77.7	41.2
DPA2-L6 (reference)	12.12	109.3	83.1

Key Differences from SE_E2_A

Aspect	SE_E2_A	DPA3
Architecture	Two-body embedding	Message passing on LiGS
Default precision	float64	float32
Optimizer	Adam	AdamW (with weight_decay)
Loss prefactors	e: 0.02→1, f: 1000→1	e: 0.2→20, f: 100→60, v: 0.02→1
stop_lr	3.51e-8	3e-5
Gradient clipping	Not used	gradient_max_norm: 5.0
Virial training	Optional	Recommended
Model compression	Supported	Not supported
Activation	tanh (default)	silut:10.0

Key Hyperparameters

Repflow (Descriptor)

Parameter	Description	Default
`n_dim`	Node embedding dimension	128 or 256
`e_dim`	Edge embedding dimension	64 or 128
`a_dim`	Angle embedding dimension	32
`nlayers`	Number of message passing layers	3 or 6
`e_rcut`	Edge cutoff radius (A)	6.0
`e_rcut_smth`	Edge smooth cutoff start	5.3
`e_sel`	Max edge neighbors	120
`a_rcut`	Angle cutoff radius (A)	4.0
`a_rcut_smth`	Angle smooth cutoff start	3.5
`a_sel`	Max angle neighbors	30
`update_style`	Residual update style	”res_residual”
`update_residual`	Residual scaling factor	0.1

Activation Function

DPA3 uses silut:10.0 by default. For datasets where training is unstable, consider switching to tanh:

"descriptor": {
  "type": "dpa3",
  "repflow": { ... },
  "activation_function": "tanh"
},
"fitting_net": {
  "activation_function": "tanh"
}

Optimizer

DPA3 uses AdamW by default (decoupled weight decay):

"optimizer": {
  "type": "AdamW",
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "weight_decay": 0.001
}

Gradient Clipping

Recommended for DPA3 to stabilize training:

"training": {
  "gradient_max_norm": 5.0
}

deepmd-train-dpa3

Installation