Tutorial in Silico Screening

This tutorial covers how to perform insilico filtering of a set of molecules represented as smiles. Table smiles to filter contains some smiles examples representing molecules with different functional groups.

smiles to filter
smiles
CN1C=NC2=C1C(=O)N(C(=O)N2C)C
OC(=O)C1CNC2C3C4CC2C1N34
C1=CC=CC=C1
OC(=O)C1CNC2COC1(C2)C#C
CCO
CCCCCCCCC=CCCCCCCCC(=O)O
CC(=O)O
O=C(O)Cc1ccccc1
CC(C(=O)O)O

The filtering process consists in excluding (or including) a set of molecules based on structural characteristics like their functional groups or derived properties like bulkiness.

To run the simulation, the user must provide two files: one containing the smiles that she wants to filter and another file containing the values of the properties used as filters.

Simulation input

The smiles input should be in csv format like

,smiles
,CC(=O)O
,CCC(=O)O

The properties specification file to perform the filtering must be a yaml file following the subsequent schema yaml:

smiles_file:
  smiles.csv

core:
  "Cd68Se55.xyz"

anchor:
  "O(C=O)[H]"

batch_size: 1000

filters:
  include_functional_groups:
    groups:
       - "[CX3](=O)[OX2H1]" # Include carboxylic acids
    maximum: 1
  exclude_functional_groups:
    groups:
       - "[NX3]"  # Exclude tertiary amines
       - "C#C"    # Exclude triplet Carbon-Carbon bonds
  scscore:
    lower_than:
      3.0
  bulkiness:
    lower_than: 20

The smiles_file entry contains the path to the files containing the smiles. The other keywords will be explain in the following sections.

Available filters

Note

The filters are run in sequential order, meaning that second filter is applied to the set of molecules remaining after applying the first filters, the third filter is applied after the second and so on.

1. Include and exclude function groups

The include_functional_groups and exclude_functional_groups as their names suggest keep and drop molecules based on a list of functional groups represented as SMARTS.

the maximum keyword indicates what is the maximum number of functional groups that can be present.

2. Synthesizability scores

The scscore is a measure of synthetic complexity. It is scaled from 1 to 5 to facilited human interpretation. See the scscore paper for further details.

3. Bulkiness

Assuming that a given molecule can be attached to a given surface, the bulkiness descriptor gives a measure of the volumen occupied by the molecule from the anchoring point extending outwards as a cone. It requires the core keywords specifying the surface to attach the molecule and the anchor functional group used as attachment. See the the CAT bulkiness for further information.

Running the filtering script

To perform the screening you just need to execute the following command ::
smiles_screener -i path_to_yaml_input.yml

Job distributions and results

For a given filter, Flamingo will try to compute the molecular properties in parallel since properties can be computed independently for each molecule. Therefore Flamingo split the molecular set into batches that can be computed in parallel. The batch_size keyword is used to control the size of these batches.

After the computation has finished the filtered molecules are stored in the results folder in the current work directory. In that folder you can find a candidates.csv file for each batch containing the final molecules.