Computational Chemistry Agent Skills

rdkit-repr

molecular-representation
A standardized CLI wrapper for RDKit molecular featurization workflows that handles physicochemical descriptor computation (outputs .csv) and molecular fingerprint extraction (outputs .npy or .csv), with built-in SMILES validation. USE WHEN you need to compute RDKit molecular descriptors or fingerprints from SMILES datasets (.csv/.smi), or when you want to list all available descriptor names and presets.
v1.0 Requires uv. Dependencies (rdkit, pandas, numpy) are declared as PEP 723 inline script metadata and are installed automatically when the script is invoked with `uv run <script_path>` (do NOT use `uv run python <script_path>` -- that bypasses the inline metadata and will not install dependencies automatically). repository source

Installation

Install folder: rdkit-repr · Repo path: molecular-representation/rdkit-repr
Copy/paste this message to your OpenClaw agent.
Please install the OpenClaw skill "rdkit-repr" on the OpenClaw host.

Steps:
- Download: https://skills.computchem.cn/skill-zips/rdkit-repr.zip
- Unzip it to get rdkit-repr/
- Copy rdkit-repr/ into the workspace skills directory (<workspace>/skills/)
- Start a NEW OpenClaw session so the skill is loaded

Then verify:
openclaw skills list --eligible
openclaw skills info rdkit-repr
Prerequisites: Requires uv. Dependencies (rdkit, pandas, numpy) are declared as PEP 723 inline script metadata and are installed automatically when the script is invoked with `uv run <script_path>` (do NOT use `uv run python <script_path>` -- that bypasses the inline metadata and will not install dependencies automatically).

RDKit Molecular Featurization

This skill provides practical command patterns for RDKit descriptor and fingerprint extraction using the standardized CLI wrapper: <skill_path>/scripts/rdkit_helper.py.

Key behaviors (important for Agents):

  • The script prints environment detection (Python/RDKit/NumPy/Pandas) by default.
  • Bad/illegal SMILES are skipped and logged to *.skipped.csv (no crash).
  • Each run ends by printing absolute output paths like:
    • [RESULT] desc_csv=/abs/path.csv
    • [RESULT] fp_npy=/abs/path.npy
    • [RESULT] fp_csv=/abs/path.csv

Quick Start

Check CLI help:

uv run <skill_path>/scripts/rdkit_helper.py --help

Check subcommand help:

uv run <skill_path>/scripts/rdkit_helper.py desc --help
uv run <skill_path>/scripts/rdkit_helper.py fp --help
uv run <skill_path>/scripts/rdkit_helper.py list-desc --help

Disable environment printing (optional):

uv run <skill_path>/scripts/rdkit_helper.py --no-env desc --smiles "CCO" --output out.csv

Core Tasks

1) Compute physicochemical descriptors → .csv

Single SMILES (default preset: physchem, 25 descriptors):

uv run <skill_path>/scripts/rdkit_helper.py desc \
    --smiles "CCO" \
    --output /tmp/CCO.desc.csv

From CSV (default SMILES column is smiles):

uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file data.csv \
    --smiles-col smiles \
    --output data.desc.csv

From SMI:

uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file molecules.smi \
    --output molecules.desc.csv

Choose a descriptor preset:

# Lipinski drug-likeness (6 descriptors: MolWt, MolLogP, NumHDonors, ...)
uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file data.csv --preset lipinski --output data.lipinski.csv

# Extended physicochemical (25 descriptors, default)
uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file data.csv --preset physchem --output data.physchem.csv

# Topological / graph indices (56 descriptors: BalabanJ, BertzCT, Chi*, PEOE_VSA*, ...)
uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file data.csv --preset topological --output data.topo.csv

# All RDKit descriptors (~200 descriptors)
uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file data.csv --preset all --output data.all_desc.csv

Select specific descriptors (overrides --preset):

uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file data.csv \
    --descriptors "MolWt,MolLogP,TPSA,NumHDonors,NumHAcceptors" \
    --output data.custom.csv

Suppress merging back original CSV columns (output only smiles + descriptors):

uv run <skill_path>/scripts/rdkit_helper.py desc \
    --file data.csv --preset physchem --no-merge --output data.desc_only.csv

2) Compute molecular fingerprints → .npy or .csv

Available fingerprint types:

TypeDescriptionDefault bits
morgan2Morgan circular FP radius 2 (ECFP4-like), bit vector2048
morgan3Morgan circular FP radius 3 (ECFP6-like), bit vector2048
morgan2_countMorgan radius-2 count vector2048
rdkitRDKit path-based FP, bit vector2048
maccsMACCS 167 structural keys (bit vector, --nbits ignored)167
topologicalTopological torsion FP (count vector, hashed to --nbits)2048
atompairAtom-pair FP (count vector, hashed to --nbits)2048
layeredLayered substructure FP, bit vector2048
patternSMARTS pattern FP, bit vector2048

Single SMILES, output as NumPy array (.npy):

uv run <skill_path>/scripts/rdkit_helper.py fp \
    --smiles "CCO" \
    --type morgan2 \
    --output /tmp/CCO.morgan2.npy

From CSV, Morgan ECFP4 (2048 bits):

uv run <skill_path>/scripts/rdkit_helper.py fp \
    --file data.csv \
    --smiles-col smiles \
    --type morgan2 \
    --nbits 2048 \
    --output data.morgan2.npy

From SMI, MACCS keys (always 167 bits):

uv run <skill_path>/scripts/rdkit_helper.py fp \
    --file molecules.smi \
    --type maccs \
    --output molecules.maccs.npy

Output as CSV (smiles + bit_0 … bit_N-1 columns):

uv run <skill_path>/scripts/rdkit_helper.py fp \
    --file data.csv \
    --type rdkit \
    --nbits 1024 \
    --format csv \
    --output data.rdkfp.csv

Atom-pair fingerprint, 4096 bits:

uv run <skill_path>/scripts/rdkit_helper.py fp \
    --file data.csv \
    --type atompair \
    --nbits 4096 \
    --output data.atompair.npy

3) List available descriptors

List all descriptors and built-in presets:

uv run <skill_path>/scripts/rdkit_helper.py list-desc

List descriptors in a specific preset group:

uv run <skill_path>/scripts/rdkit_helper.py list-desc --group lipinski
uv run <skill_path>/scripts/rdkit_helper.py list-desc --group physchem
uv run <skill_path>/scripts/rdkit_helper.py list-desc --group topological
uv run <skill_path>/scripts/rdkit_helper.py list-desc --group all

Descriptor Presets Reference

PresetCountTypical Use
lipinski6Quick drug-likeness screening (Ro5 filter)
physchem25General ML features: MW, logP, TPSA, ring counts, charge stats, …
topological56Graph/topology indices: Balaban J, Kappa, Chi, PEOE_VSA, EState_VSA, …
all~200Full RDKit descriptor set (includes fragment counts, MQN, etc.)

Output Format Notes

desc output (CSV):

  • Columns: smiles, then one column per descriptor.
  • When --file is a .csv and --no-merge is not set, original CSV columns are appended.
  • Rows only contain valid SMILES (invalid ones are logged to *.skipped.csv).

fp output:

  • .npy (default): NumPy array of shape (N_valid, nbits), dtype uint8 (bit) or int32 (count).
  • .csv: smiles column followed by bit_0bit_{nbits-1} columns.
  • MACCS keys always produce 167 bits regardless of --nbits.

Agent Checklist

When using this skill for users:

  1. Confirm input format:
    • .csv requires a SMILES column (default smiles)
    • .smi uses the first token of each line as SMILES
  2. Quote SMILES containing special shell characters (brackets/parentheses):
    • Example: --smiles "[C@@H](O)(F)Cl"
  3. For CSV workflows, verify column names:
    • desc: --smiles-col
    • fp: --smiles-col
  4. Choose the right preset or fingerprint type for the downstream task:
    • Drug screening / Ro5: --preset lipinski
    • General ML featurization: --preset physchem or --type morgan2
    • Structural similarity search: --type morgan2 or --type rdkit
    • Substructure matching: --type maccs or --type pattern
  5. Watch for skipped SMILES:
    • Check *.skipped.csv and decide whether to fix or permanently drop them
  6. Always capture absolute output paths:
    • Look for [RESULT] ...=/abs/path in stdout
  7. If debugging is needed, enable full traceback:
    • RDKIT_HELPER_TRACE=1 uv run <skill_path>/scripts/rdkit_helper.py ...

References