Uni-Mol
This skill provides practical command patterns for Uni-Mol molecular representation / training / prediction using the standardized CLI wrapper: <skill_path>/scripts/unimol_helper.py.
Key behaviors (important for Agents):
- The script prints environment detection (Python/Torch/CUDA) by default.
- Bad/illegal SMILES are skipped and logged to
*.skipped.csv(no crash). - Each run ends by printing absolute output paths like:
[RESULT] repr_npy=/abs/path.npy[RESULT] model_dir=/abs/model_dir[RESULT] pred_csv=/abs/pred.csv
Quick Start
Check CLI help:
uv run python <skill_path>/scripts/unimol_helper.py --help
Check subcommand help:
uv run python <skill_path>/scripts/unimol_helper.py repr --help
uv run python <skill_path>/scripts/unimol_helper.py train --help
uv run python <skill_path>/scripts/unimol_helper.py predict --help
Disable environment printing (optional):
uv run python <skill_path>/scripts/unimol_helper.py --no-env repr --smiles "CCO" --output out.npy
Core Tasks
1) Extract molecular representations (embedding) to .npy
Single SMILES:
uv run python <skill_path>/scripts/unimol_helper.py repr \
--smiles "CCO" \
--output /tmp/ccO.repr.npy
From CSV (default SMILES column is smiles):
uv run python <skill_path>/scripts/unimol_helper.py repr \
--file data.csv \
--smiles-col smiles \
--output data.repr.npy
From SMI:
uv run python <skill_path>/scripts/unimol_helper.py repr \
--file molecules.smi \
--output molecules.repr.npy
Force CPU / GPU:
# Force CPU
uv run python <skill_path>/scripts/unimol_helper.py repr --smiles "CCO" --no-gpu --output out.npy
# Force GPU (will warn & fall back if CUDA is unavailable)
uv run python <skill_path>/scripts/unimol_helper.py repr --smiles "CCO" --use-gpu --output out.npy
2) Train a property model (classification / regression / multilabel_*)
Regression training (CSV must contain smiles and target columns):
uv run python <skill_path>/scripts/unimol_helper.py train \
--task regression \
--input train.csv \
--smiles-col smiles \
--target-col target \
--epochs 50 \
--output ./model_reg
Classification training:
uv run python <skill_path>/scripts/unimol_helper.py train \
--task classification \
--input train.csv \
--smiles-col smiles \
--target-col target \
--epochs 50 \
--output ./model_cls
Multilabel regression training (explicit multi-target columns):
uv run python <skill_path>/scripts/unimol_helper.py train \
--task multilabel_regression \
--input train.csv \
--smiles-col smiles \
--target-cols target_0,target_1,target_2 \
--epochs 50 \
--output ./model_mreg
Multilabel classification training:
uv run python <skill_path>/scripts/unimol_helper.py train \
--task multilabel_classification \
--input train.csv \
--smiles-col smiles \
--target-cols y_cls_0,y_cls_1,y_cls_2 \
--epochs 50 \
--output ./model_mcls
Target recognition for training:
- Single-task (
classification/regression): use--target-col(defaulttarget). - Multilabel tasks: prefer
--target-cols(comma-separated). - If
--target-colsis omitted for multilabel tasks, the helper auto-detects columns namedtargetor prefixed withtarget_(case-insensitive).
Force CPU:
uv run python <skill_path>/scripts/unimol_helper.py train \
--task regression \
--input train.csv \
--epochs 50 \
--output ./model_cpu \
--no-cuda
3) Predict properties to .csv
Predict from CSV:
uv run python <skill_path>/scripts/unimol_helper.py predict \
--model ./model_reg \
--input test.csv \
--smiles-col smiles \
--output pred.csv
Predict from SMI:
uv run python <skill_path>/scripts/unimol_helper.py predict \
--model ./model_reg \
--input test.smi \
--output pred.csv
Notes:
- Output CSV contains the input rows (for valid SMILES) plus
pred/pred_*columns. - If there are bad SMILES, they are skipped and saved to
pred.csv.skipped.csv(or your--error-logpath).
Agent Checklist
When using this skill for users:
- Confirm input format:
.csvrequires a SMILES column (defaultsmiles).smiuses the first token of each line as SMILES
- Quote SMILES containing special characters (brackets/parentheses):
- Example:
--smiles "[C]([H])([H])[H]"
- Example:
- For CSV workflows, verify column names:
repr:--smiles-coltrain:--smiles-coland--target-col/--target-colspredict:--smiles-col
- Watch for skipped SMILES:
- Check
*.skipped.csvand decide whether to fix or permanently drop them
- Check
- Always capture absolute output paths:
- Look for
[RESULT] ...=/abs/pathin stdout
- Look for
- If debugging is needed, enable full traceback:
UNIMOL_HELPER_TRACE=1 uv run python <skill_path>/scripts/unimol_helper.py ...
References
- Uni-Mol project: https://github.com/fanxiaoyu0/Uni-Mol
- RDKit: https://www.rdkit.org/