Search Species
🔄 Core Workflow (CRITICAL)
When assisting users with chemical searches, you MUST adhere to the following step-by-step workflow. Note: Searching can be highly time-consuming; always prioritize efficiency.
- Acquire Target: Identify the chemical name, identifier, or SMILES the user wants to query.
- Select Engine: Choose the most targeted search backend (
pubchem,opsin,wikidata, orall) based on the query type. Avoid usingallunless strictly necessary, to minimize search times. - Execute Search: Use the
searchcommand to query the database. You must set an appropriatemax_candslimit to prevent excessively long processing times and reduce data noise. - Evaluate Results: Carefully review the returned summary data in the output.
- Confirm & Iterate: Present the retrieved data to the user for confirmation. If the result is ambiguous or incorrect, communicate with the user to adjust the search keywords and restart the process.
Search Backend Overview
search-species integrates three distinct backends. Each serves a specific purpose in the chemical informatics workflow:
| Feature | OPSIN | PubChem | Wikidata |
|---|---|---|---|
| Core Method | Algorithmic Parser | Curated Database | Knowledge Graph |
| Primary Input | IUPAC English Names | Names, CIDs, SMILES | Common & Multilingual Names |
| Molecular Image | Supported (Rendered) | Supported (Stored) | Rarely Available |
| Mass/Formula | Calculated via RDKit | Database Metadata | Database Metadata |
| Key Strength | Handles theoretical molecules. | Highly standardized data. | Vernacular & Cross-lingual. |
(For more detailed engine capabilities, limitations, and data normalization behavior, see reference/backends.md)
Quick Start & Command Outputs
Typical search syntax:
uvx search-species <engine> "<query>" [max_cands] -o <output_dir>
Output: Prints the retrieved species data summary and the file path where each candidate’s JSON is saved (e.g.,
SpeciesCandidate(...) written -> ./cache/xyz.json).
Typical render syntax:
uvx --from search-species render-species <candidate_files...> -o <output_dir>
Output: Prints the file path of the successfully generated image card (e.g.,
Successfully rendered -> ./gallery/xyz.png).
Example
PubChem (Standard database lookups):
uvx search-species pubchem "benzene" 5 -o ./results
OPSIN (Theoretical molecules & strict IUPAC):
uvx search-species opsin "2-acetyloxybenzoic acid"
Wikidata (Multilingual & common/trade names):
uvx search-species wikidata "Аспирин"
uvx search-species wikidata "TNT"
Agent Checklist
When using this toolkit for users, ensure you cross-check these points with the Core Workflow:
- Engine Match: Match the engine to the query type based on the overview table.
- Data Scope: Remember this tool only retrieves structural identity (Name, Formula, Mass, SMILES, 2D Image).
- Fallback: If
pubchemfails on a systematic name, fallback toopsin. - Quoting: Always wrap the chemical
<query>in quotes.
References
- Engine Details & Limitations:
reference/backends.md