This page features some code examples, which we hope are the main selling points of Molpher-lib. The library can generally do much more, though, and many of these examples are more thoroughly explained in the documentation.
The main use case for Molpher-lib is automatic generation of new molecular structures from a given starting point. What 'derivatives' we get depends on a set of morphing operators. These operators can take any shape or form. In the original Molpher approach, they are a set of somewhat arbitrarily chosen structural modifications (add atom, add bond, remove atom, remove bond...), but they can also be elementary chemical transformations or other transformations that might be of interest.
Here is an example of how the library can be used to generate new structural analogs of captopril, a famous hypertension drug:
from rdkit import Chem
from molpher.core import MolpherMol
from molpher.core.morphing import Molpher
from molpher.core.morphing.operators import *
# define a collector -> a callback function that processes morphs as they are generated
strange_patterns = Chem.MolFromSmarts('[S,O,N][F,Cl,Br,I]')
sensible_morphs = dict()
def collect_sensible(morph, operator):
"""
simple collector, accepts morphs without some weird structural patterns
"""
rd_morph = morph.asRDMol()
if not rd_morph.HasSubstructMatch(strange_patterns):
sensible_morphs[morph.smiles] = morph
# load a molecule from SDF and generate some derived molecules with given morphing operators
mol = MolpherMol("captopril.sdf")
molpher = Molpher(
mol
, [ # list of morphing operators to use
AddAtom()
, RemoveAtom()
, MutateAtom()
, AddBond()
, RemoveBond()
, ContractBond()
, InterlayAtom()
, RerouteBond()
]
, attempts = 100 # create at most 100 molecules
, collectors = [collect_sensible]
)
# execute morphing and show created molecules
molpher()
as_mol_grid(sensible_morphs.values()) # draw generated structures in a grid
Molpher
class.
It demonstrates how a set of 'derivatives' can be formed
from a source compound using various chemical operators implemented in Molpher-lib.
Six morphs that were cropped out of the grid image depicting collected morphs. The image is generated with the as_mol_grid
function (definition not part of the example), which also highlights locked atoms (in red).
You might notice that the typical '-pril' structural pattern is preserved among all of the generated structures in this example. It is because atoms forming this substructure were locked away from certain modifications in the captopril.sdf
file. You can read how atom locking works in the introductory introductory tutorial where this example is from.
Since morphing operators play a crucial role in the generation process, the library makes implementations of new operators possible through the MorphingOperator
interface:
from rdkit import Chem
from molpher.core import MolpherMol, MolpherAtom
from molpher.core.morphing import Molpher
from molpher.core.morphing.operators import *
from molpher.random_numbers import get_random_number
class AddFragment(MorphingOperator):
"""
Attaches a given molecule fragment to an atom in the molecule.
"""
def __init__(self, fragment, open_atoms_frag, oper_name):
super(AddFragment, self).__init__()
self._name = oper_name # name of the operator
self._fragment = fragment # fragment as RDKit Mol
self._open_atoms_frag = open_atoms_frag # possible attachment positions on the fragment
self._orig_rdkit = None # original molecule as RDKit Mol
self._open_atoms = [] # possible attachment positions on the original molecule
def setOriginal(self, mol):
super(AddFragment, self).setOriginal(mol)
if self.original:
self._orig_rdkit = self.original.asRDMol()
self._open_atoms = []
for atm_rdkit, atm_molpher in zip(self._orig_rdkit.GetAtoms(), self.original.atoms):
free_bonds = atm_rdkit.GetImplicitValence()
if free_bonds >= 1 and not (MolpherAtom.NO_ADDITION & atm_molpher.locking_mask):
self._open_atoms.append(atm_rdkit.GetIdx())
def morph(self):
combo_mol = Chem.EditableMol(Chem.CombineMols(
self._orig_rdkit
, self._fragment
))
atom_orig = self._open_atoms[get_random_number(0, len(self._open_atoms)-1)]
atom_frag = len(self.original.atoms) + self._open_atoms_frag[get_random_number(0, len(self._open_atoms_frag)-1)]
combo_mol.AddBond(atom_orig, atom_frag, order=Chem.rdchem.BondType.SINGLE)
combo_mol = combo_mol.GetMol()
Chem.SanitizeMol(combo_mol)
ret = MolpherMol(other=combo_mol)
for atm_ret, atm_orig in zip(ret.atoms, self.original.atoms):
atm_ret.locking_mask = atm_orig.locking_mask
return ret
def getName(self):
return self._name
# define a collector -> a callback function that processes morphs as they are generated
strange_patterns = Chem.MolFromSmarts('[S,O,N][F,Cl,Br,I]')
sensible_morphs = dict()
def collect_sensible(morph, operator):
"""
simple collector, accepts morphs without some weird structural patterns
"""
rd_morph = morph.asRDMol()
if not rd_morph.HasSubstructMatch(strange_patterns):
sensible_morphs[morph.smiles] = morph
morph.parent_operator = operator.getName()
# create some AddFragment operators
fragments = ['c1ccccc1', 'C(=O)O']
add_frags = []
for frag in fragments:
add_frag = AddFragment(Chem.MolFromSmiles(frag), [0], "Add " + frag)
add_frags.append(add_frag)
# load a molecule from SDF and generate some derived molecules with given morphing operators
mol = MolpherMol("captopril.sdf")
molpher = Molpher(
mol
, [ # list of morphing operators to use
AddAtom()
, RemoveAtom()
, MutateAtom()
, AddBond()
, RemoveBond()
, ContractBond()
, InterlayAtom()
, RerouteBond()
] + add_frags # add our custom operators, too
, attempts = 100 # create at most 100 molecules
, collectors = [collect_sensible]
)
# execute morphing and show created molecules
molpher()
as_mol_grid(sensible_morphs.values()) # draw generated structures in a grid
MorphingOperator
abstract class to implement a customized operator AddFragment
.
Example structures of morphs produced with the code above.
This code is essentially the same as above, but contains a few more lines, which define the new operator itself. The created instances are used by the Molpher
class in the same manner as the built-in operators.
When we focus on the definition of the operator itself, we can see that Molpher-lib can be easily integrated with the RDKit cheminformatics library because its MolpherMol
instances can be cloned to RDKit molecules with a call to the asRDMol
method.
Also note the improved collector information where we now set the parent_operator
attribute of the generated morphs. The value of this attribute is then used to generate labels in the image and tells us what operator was used to generate the given structure. You can learn more about implementing operators in the appropriate section of the documentation
As was the case in the original Molpher approach, Molpher-lib is able to generate chemical space path from one molecule to another. Using the original Molpher algorithm from the algorithms package, we can perform a search from cocaine to procaine, for example:
from molpher.algorithms.classic.run import run
from molpher.algorithms.settings import Settings
# our source and target molecules
cocaine = 'CN1C2CCC1C(C(=O)OC)C(OC(=O)c1ccccc1)C2'
procaine = 'O=C(OCCN(CC)CC)c1ccc(N)cc1'
# directory where the path will be stored (as a pickled list)
storage_dir = 'data'
# initialize the exploration settings
settings = Settings(
source=cocaine
, target=procaine
, storage_dir=storage_dir
, max_threads=4
)
run(settings)
If we want to have more control over what actually happens during the search process, we can use the exploration tree API to implement our own algorithm:
from molpher.core import ExplorationTree as ETree
from molpher.algorithms.functions import find_path
cocaine = 'CN1C2CCC1C(C(=O)OC)C(OC(=O)c1ccccc1)C2'
procaine = 'O=C(OCCN(CC)CC)c1ccc(N)cc1'
tree = ETree.create(source=cocaine, target=procaine) # create the tree
counter = 0
while not tree.path_found:
counter+=1
print("Iteration", counter)
tree.generateMorphs() # generate the first generation of morphs
tree.sortMorphs() # sort morphs according to their distance to target (ascending)
tree.filterMorphs() # remove molecules that do not meet certain criteria
tree.extend() # connect the remaining molecules to the exploration tree
tree.prune() # remove branches of the tree that do not converge
as_mol_grid(tree.fetchPathTo(tree.params['target']))
Molecular structures on a chemical space path between cocaine and procaine.
This algorithm is basically an implementation of the one in the previous example. The tree is a data structure which keeps track of all possible paths that one might be interested in. We can extend the tree, remove certain molecules or paths and do many other things by performing operations. In the code example above, we used the shortcut methods available on the tree instance, but the built-in operations are all defined as separate callable classes under molpher.core.operations
. Their behaviour can be adjusted using various settings, but it is also possible to define new operations end use them in a unified manner (see Defining Operations).
All operations that are already implemented in the library are available from the molpher.core.operations
package. All of them share the same interface and can be performed on a tree using its runOperation()
method (see the code example below). New operations can be easily implemented by inherirting from the TreeOperation()
base class and implementing its __call__()
method. In the example below, we show a simple filtering operation implementation:
from molpher.algorithms.functions import find_path
from molpher.core import ExplorationTree as ETree
from molpher.core.operations import *
cocaine = 'CN1[C@H]2CC[C@@H]1[C@@H](C(=O)OC)[C@@H](OC(=O)c1ccccc1)C2'
procaine = 'O=C(OCCN(CC)CC)c1ccc(N)cc1'
class NitorgenFilter(TreeOperation):
def __call__(self):
"""
This method can only be called when a tree is attached to the operation
(can be specified in the constructor, with the setTree() method or simply
by writing to the 'tree' attribute of the instance). When the runOperation()
method is executed, the tree is automatically added.
"""
new_mask = [ 'N' in x.smiles for x in self.tree.candidates ]
self.tree.candidates_mask = new_mask
iteration = [
GenerateMorphsOper()
, SortMorphsOper()
, FilterMorphsOper() # the default filter
, CleanMorphsOper() # discards morphs that were previously filtered out
, NitorgenFilter() # our customized filter
, ExtendTreeOper() # connect the remaining structures to the tree
, PruneTreeOper()
]
tree = ETree.create(source=cocaine, target=procaine)
counter = 0
while not tree.path_found:
counter+=1
print("Iteration", counter)
for oper in iteration:
tree.runOperation(oper)
as_mol_grid(tree.fetchPathTo(tree.params['target']))
NitorgenFilter
) to discard molecules that do not contain nitrogen.
Every tree contains an array that masks the list of candidates
that are currently evaluated (populated by GenerateMorphsOper
).
This mask is used to mark structures that should be removed from
the list of candidates upon extending the tree or when
CleanMorphsOper
is called. Tree operations can be used to manipulate this mask and affect what molecules are accepted as the next generation in the evolution. Our customized operation in the example above does not really do much. It just discards generated structures that do not contain nitrogen. However, a more elaborate filtering scheme could also be implemented in this manner.
We have shown some common use cases of Molpher-lib. However, there is much more. For example, the library also provides means of traversing the molecules in the tree (or its subtree) or serializing tree snapshots at any point. You might want to head to the tutorial if you want a more complete overview of the software.