RDKit Conformer Generation
This skill provides practical command patterns for RDKit 3D/2D conformer generation
using the standardized CLI wrapper: <skill_path>/scripts/rdkit_conf_helper.py.
Key behaviors (important for Agents):
- The script prints environment detection (Python/RDKit/Pandas) by default.
- Multi-conformer sampling: embeds
--num-confsconformers (default 10) per molecule viaEmbedMultipleConfs, optimizes each with the chosen force field, and keeps the lowest-energy one. Set--num-confs 1to revert to single-conformer behavior. - 2D fallback: if all 3D embedding attempts fail,
Compute2DCoordsis used instead and a[WARN]line is printed to stderr for that molecule. - Bad/illegal SMILES are skipped entirely and logged to
*.skipped.csv(no crash). - Molecules that fell back to 2D are additionally logged to
*.fallback.csv. - Each run ends with a summary line and absolute output paths:
[INFO] Done: <N_3d> 3D, <N_2d> 2D-fallback, <N_skip> skipped (total input: <N>)[RESULT] conf_sdf=/abs/path.sdf[RESULT] conf_xyz=/abs/path.xyz[RESULT] fallback_csv=/abs/path.fallback.csv(only if any 2D fallbacks occurred)[RESULT] skipped_csv=/abs/path.skipped.csv(only if any SMILES were skipped)
Quick Start
Check CLI help:
uv run <skill_path>/scripts/rdkit_conf_helper.py --help
uv run <skill_path>/scripts/rdkit_conf_helper.py conf --help
Disable environment printing (optional):
uv run <skill_path>/scripts/rdkit_conf_helper.py --no-env conf --smiles "CCO" --output out.sdf
Core Tasks
1) Generate 3D conformers (SDF output, default)
Single SMILES:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--smiles "CCO" \
--output /tmp/CCO.sdf
Single SMILES with a custom molecule name:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--smiles "c1ccccc1" \
--name benzene \
--output /tmp/benzene.sdf
From CSV (default SMILES column: smiles):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--smiles-col smiles \
--output data.sdf
From CSV with a name column:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--smiles-col smiles \
--name-col compound_id \
--output data.sdf
From SMI (second token per line is used as name automatically):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file molecules.smi \
--output molecules.sdf
2) Control conformer sampling count
Default (10 conformers sampled, lowest-energy kept):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --output data.sdf
Single conformer (fastest, least thorough):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --num-confs 1 --output data.sdf
Increase sampling for flexible or macrocyclic molecules:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --num-confs 50 --output data.sdf
3) Choose force-field minimization
MMFF94s (default, falls back to UFF if unavailable):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --ff mmff94s --output data.mmff.sdf
UFF (universal force field):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --ff uff --output data.uff.sdf
Skip force-field optimization (raw ETKDG geometry only):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --ff none --output data.etkdg_raw.sdf
4) XYZ output
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--format xyz \
--output data.xyz
5) Tuning embedding for difficult molecules
Large or macrocyclic molecules sometimes fail standard ETKDG; try random initial coordinates:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file macrocycles.csv \
--use-random-coords \
--max-attempts 500 \
--output macrocycles.sdf
Use a different random seed (reproducibility):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --seed 123 --output data.seed123.sdf
Non-deterministic embedding (seed = -1):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --seed -1 --output data.sdf
6) Suppress hydrogen addition
By default explicit H atoms are added before embedding for more accurate 3D geometry.
Use --no-hs to keep the molecule as-is (heavy atoms only):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --no-hs --output data.noh.sdf
7) Custom log file paths
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--output data.sdf \
--error-log logs/skipped.csv \
--fallback-log logs/used_2d.csv
3D Embedding Pipeline Details
For each molecule, the script runs the following steps in order:
- Parse SMILES via
Chem.MolFromSmiles. - Add hydrogens (
Chem.AddHs) — skipped with--no-hs. - Multi-conformer 3D embedding (
EmbedMultipleConfs,--num-confscandidates, default 10): tries ETKDGv3, then ETKDGv2, then ETDG, then ETDG+useRandomCoordsas a fallback chain until at least one conformer is embedded. - Force-field minimization (if
--ffis notnone): each successfully embedded conformer is individually optimized. MMFF94s transparently falls back to UFF if parameters are unavailable for that molecule. - Lowest-energy selection: the conformer with the minimum post-optimization energy
is retained; all others are discarded. If
--ff none, the first embedded conformer is kept without energy ranking. - 2D fallback (if all 3D attempts yield zero conformers): generates a flat 2D layout
via
Compute2DCoords(Z=0 for all atoms), prints a[WARN]to stderr, and records the molecule in the fallback log.
Output Format Notes
SDF output (--format sdf, default):
- Standard V2000 multi-molecule SDF, one conformer per molecule.
- Molecule name (from
--name,--name-col, or auto-generatedmol_<i>) is written to the SDF header line. - Compatible with most cheminformatics tools (RDKit, OpenBabel, Schrodinger, etc.).
XYZ output (--format xyz):
- Concatenated XYZ blocks (element, x, y, z per atom).
- Molecule name is written as the comment line (second line of each block).
- Coordinates are in Angstroms.
- Note: if
--no-hsis used, hydrogen atoms are absent from the XYZ.
Fallback log (*.fallback.csv):
- Written only when at least one molecule fell back to 2D.
- Columns:
idx,smiles,name,dim(always 2),ff(always2d_fallback),note.
Skipped log (*.skipped.csv):
- Written only when at least one SMILES was skipped.
- Columns:
idx,smiles,error.
Agent Checklist
When using this skill for users:
- Confirm input format:
.csvrequires a SMILES column (defaultsmiles).smiuses the first token per line as SMILES, second token (if present) as name
- Quote SMILES containing special shell characters (brackets/parentheses):
- Example:
--smiles "[C@@H](O)(F)Cl"
- Example:
- For CSV workflows, verify column names:
--smiles-colfor the SMILES column--name-col(optional) for molecule identifiers to embed in SDF/XYZ headers
- Check the
[INFO] Done:summary line for the 3D/2D/skip breakdown. - If 2D fallbacks occurred, inspect
*.fallback.csv:- Consider
--use-random-coordsor--max-attemptstuning for the affected SMILES. - 2D conformers have Z=0 and are not suitable for 3D-based applications (docking, 3D QSAR).
- Consider
- Always capture absolute output paths:
- Look for
[RESULT] ...=/abs/pathin stdout.
- Look for
- If debugging is needed, enable full traceback:
RDKIT_CONF_HELPER_TRACE=1 uv run <skill_path>/scripts/rdkit_conf_helper.py ...
References
- RDKit conformer generation guide: https://www.rdkit.org/docs/GettingStartedInPython.html#working-with-3d-molecules
- ETKDG paper: Riniker & Landrum, J. Chem. Inf. Model. 2015, 55, 2562
- ETKDGv3: Wang et al., J. Chem. Inf. Model. 2020, 60, 2044