1.1.1.1. molpher.core package

This package contains the most essential features of the library. It provides everything that the user of the library will need. All the modules and classes below and their contents are meant to be imported by external scripts.

1.1.1.1.1. Modules

1.1.1.1.1.1. molpher.core.ExplorationData module

This module houses the ExplorationData class:

class molpher.core.ExplorationData.ExplorationData(other=None, **kwargs)[source]

Bases: molpher.swig_wrappers.core.ExplorationData

Parameters:

Note

If both other and **kwargs are specified, then everything in **kwargs will be applied after the instance in other is wrapped.

This a specialized version of the molpher.swig_wrappers.core.ExplorationData proxy class. It implements some additional functionality for ease of use from Python.

It contains all the information needed to initialize an ExplorationTree instance. Additionally, any tree can be transformed into an instance of this class by calling the asData() method.

One advantage of this class over the ExplorationTree is that it allows direct modifications of the exploration tree structure. This is especially useful when we want to create an initial tree topology before the exploration itself.

Warning

Note that current implementations of the modification methods is experimental and may result in undefined behaviour. Therefore, it is only recommended to use it as a means of setting morphing parameters and spawning tree instances or spawning new trees from existing ones without the need to create a snapshot file.

Because it inherits from molpher.swig_wrappers.core.ExplorationData, it provides the same interface as the corresponding C++ class, but exposes the morphing parameters as object attributes for ease of use. These attributes follow a slightly different name convention than the corresponding getters and setters of the parent class. Their names are derived from the names of the parameters used in the XML template files that are more self-explanatory and easier to remember and type. The table below gives an overview of all available parameters, their default values and short descriptions and the respective getters and setters of the base class:

Table 1.1 Morphing parameters recognized by the current version.
Attribute Default Value Brief Description Setter Getter
source None SMILES of the source molecule. setSource getSource
target None SMILES of the target molecule. setTarget getTarget
operators a tuple of selectors [1] A tuple of identifiers of the permitted chemical operators. setChemicalOperators getChemicalOperators
accept_max 100 Maximum number of candidates accepted at once (based on their position in ExplorationTree.candidates). setCntCandidatesToKeepMax getCntCandidatesToKeepMax
accept_min 50 Minimum number of candidates accepted during probability filtering. setCntCandidatesToKeep getCntCandidatesToKeep
close_produce 150 Maximum number of morphs to produce with an ExplorationTree.generateMorphs() call when close to the target molecule. setCntMorphsInDepth getCntMorphsInDepth
far_produce 80 Maximum number of morphs to produce with an ExplorationTree.generateMorphs() call. setCntMorphs getCntMorphs
far_close_threshold 0.15 Molecular distance below which the target molecule and a morph are considered to be close. setDistToTargetDepthSwitch getDistToTargetDepthSwitch
fingerprint FP_MORGAN Identification string of the current fingerprint strategy. setFingerprint getFingerprint
similarity SC_TANIMOTO Identification string of the current fingerprint strategy. setSimilarityCoefficient getSimilarityCoefficient
max_morphs_total 1500 Maximum number of morphs allowed to be derived from one molecule and the allowed number of non-producing descendants before a molecule is removed from the tree. setCntMaxMorphs getCntMaxMorphs
non_producing_survive 5 Number of iterations before descendants of a non-producing molecule are removed from the tree. setItThreshold getItThreshold
weight_max 100000.0 Maximum molecular weight of one morph. setMaxAcceptableMolecularWeight getMaxAcceptableMolecularWeight
weight_min 0.0 Minimum molecular weight of one morph. setMinAcceptableMolecularWeight getMinAcceptableMolecularWeight
[1](OP_ADD_ATOM, OP_ADD_BOND, OP_BOND_CONTRACTION, OP_BOND_REROUTE, OP_INTERLAY_ATOM, OP_MUTATE_ATOM, OP_REMOVE_ATOM, OP_REMOVE_BOND)
exception UnknownParameterException[source]

Bases: Exception

Indicates that an unknown parameter was supplied.

ExplorationData.accept_max

The maximum number of morphs allowed to be connected to the tree upon one call to extend().

If more than accept_max morphs with True in the appropriate position of candidates_mask are present in candidates and extend() is called, only first accept_max morphs from candidates will be connected to the tree and the rest will be discarded.

Returns:maximum number of candidates accepted upon extend()
Return type:int
ExplorationData.accept_min

If FilterMorphsOper.PROBABILITY is used during filtering, this is the number of morphs accepted with 100% probability.

Returns:minimum number of candidates accepted during probability filtering
Return type:int
ExplorationData.close_produce

This is the maximum number of morphs generated from one leaf when the leaf of the tree currently being processed with generateMorphs() lies less than far_close_threshold from the target molecule.

Returns:maximum number of morphs to produce with an generateMorphs() call
Return type:int
ExplorationData.far_close_threshold

This distance threshold controls the number of morphs generated with generateMorphs() for molecules closer or further from the target molecule. Morphs that have distance from the target molecule lower than far_close_threshold are considered to be close.

See also

far_produce and close_produce

Returns:distance threshold for far_produce and close_produce
Return type:float
ExplorationData.far_produce

The maximum number of morphs generated from one leaf when the leaf of the tree currently being processed with generateMorphs() lies more than far_close_threshold from the target molecule.

Returns:maximum number of morphs to produce with a generateMorphs() call
Return type:int
ExplorationData.fingerprint

Returns an identifier of the currently used molecular fingerprint.

Table 1.2 Currently supported molecular fingerprints.
Identifier Description
FP_ATOM_PAIRS
FP_MORGAN
FP_TOPOLOGICAL
FP_TOPOLOGICAL_LAYERED_1
FP_TOPOLOGICAL_LAYERED_2
FP_VECTORFP
FP_TOPOLOGICAL_TORSION
FP_EXT_ATOM_PAIRS
FP_EXT_MORGAN
FP_EXT_TOPOLOGICAL
FP_EXT_TOPOLOGICAL_LAYERED_1
FP_EXT_TOPOLOGICAL_LAYERED_2
FP_EXT_TOPOLOGICAL_TORSION
Returns:molecular fingerprint identifier
Return type:str
ExplorationData.is_valid

Shows if this instance represents valid parameters. The instance becomes invalid, if there are any bad or nonsensical parameter values, values are missing (such as undefined chemical operators) or the tree structure is for any reason unacceptable.

Returns:True for a valid instance, False for invalid
Return type:bool
static ExplorationData.load(snapshot)[source]

A factory method to create an instance of ExplorationData from a tree snapshot.

Parameters:snapshot (str) – path to the snapshot file
Returns:new instance representing the data loaded from the snapshot file
Return type:ExplorationData
ExplorationData.max_morphs_total

This value is the maximum number of morphs allowed to be generated from one molecule. If the number of generated morphs exceeds this number, all additional morphs can be filtered out using the FilterMorphsOper.MAX_DERIVATIONS filter.

It is also the maximum number of ‘bad morphs’ generated from one molecule. If a molecule has more than max_morphs_total descendants and none of them are closer to the target molecule than the molecule in question, then the molecule is permanently removed from the tree with all of its descendants when prune() is called.

Returns:maximum number of ‘bad morphs’ before pruning
Return type:int
ExplorationData.non_producing_survive

A molecule that has not produced any morphs closer to the target molecule than itself (a non-producing molecule) for non_producing_survive number of calls to extend() will have its descendants removed during the next prune() call.

Returns:number of calls to generateMorphs() before descendants of a non-producing molecule are removed from the tree
Return type:int
ExplorationData.operators

A set of chemical operators to use. These define how the input molecule and its descendants can be manipulated during morphing.

Can be set using an iterable of the appropriate selectors or their names as str. Any duplicates are automatically removed

Table 1.3 Currently supported chemical operators.
Identifier Description
OP_ADD_ATOM Add a random atom to the molecule.
OP_REMOVE_ATOM Remove an atom from the molecule.
OP_ADD_BOND Add a bond between two random atoms.
OP_REMOVE_BOND Remove a bond between two random atoms.
OP_MUTATE_ATOM Change a randomly selected atom to a different element.
OP_INTERLAY_ATOM
OP_BOND_REROUTE
OP_BOND_CONTRACTION
Returns:names of the current chemical operators
Return type:tuple of str
ExplorationData.param_dict

Holds a dictionary of current morphing parameters values for this instance. A new dictionary of parameters can be assigned to change them.

Returns:a dictionary of parameters
Return type:dict
ExplorationData.similarity

Returns an identifier of the currently used similarity measure.

Table 1.4 Currently supported similarity measures.
Identifier Description
SC_ALL_BIT
SC_ASYMMETRIC
SC_BRAUN_BLANQUET
SC_COSINE
SC_DICE
SC_KULCZYNSKI
SC_MC_CONNAUGHEY
SC_ON_BIT
SC_RUSSEL
SC_SOKAL
SC_TANIMOTO
SC_TVERSKY_SUBSTRUCTURE
SC_TVERSKY_SUPERSTRUCTURE
Returns:similarity measure identifier
Return type:str
ExplorationData.source

The source molecule. All morphs in an exploration tree are derived from this molecule during morphing. This is the root of the created tree.

Can be set using a MolpherMol instance or a SMILES string of the new source molecule.

Returns:current source molecule
Return type:MolpherMol
ExplorationData.target

The target molecule. This is the molecule being searched for during morphing. In the original version of the algorithm the goal is to maximize similarity (minimize structural distance) of the generated morphs and this molecule.

Can be set using a MolpherMol instance or a SMILES string of the new target molecule.

Returns:current target molecule
Return type:MolpherMol
ExplorationData.weight_max

If FilterMorphsOper.WEIGHT filter is used on an exploration tree, this will be the maximum weight of the candidate morphs accepted during a filtering procedure.

Returns:maximum acceptable weight during filtering
Return type:float
ExplorationData.weight_min

If FilterMorphsOper.WEIGHT filter is used on an exploration tree, this will be the minimum weight of the candidate morphs accepted during a filtering procedure.

Returns:minimum acceptable weight during filtering
Return type:float

1.1.1.1.1.2. molpher.core.ExplorationTree module

class molpher.core.ExplorationTree.Callback(callback)[source]

Bases: molpher.swig_wrappers.core.TraverseCallback

Parameters:callback (any callable object with one required parameter) – the callable to call every time a molecule is encountered during traversal

Basic callback class used to traverse the tree with the ExplorationTree.traverse() method.

It registers a callable and calls it every time a morph is processed.

class molpher.core.ExplorationTree.ExplorationTree[source]

Bases: molpher.swig_wrappers.core.ExplorationTree

This a specialized version of the molpher.swig_wrappers.core.ExplorationTree proxy class. It implements some additional functionality for ease of use from Python.

Attention

This class has no constructor defined. Use the create() factory method to obtain instances of this class.

asData()[source]
Returns:the tree as an ExplorationData instance
Return type:ExplorationData
candidates
Returns:the candidate morphs (morphs generated by a single call to generateMorphs().)
Return type:tuple of MolpherMol instances
candidates_mask

A tuple of bool objects that serve as means of filtering the candidate morphs. Each morph in candidates has a bool variable assigned to it in this tuple – only morphs with True at the appropriate position are added to the tree when extend() is called.

It can be changed by assigning a new tuple or a call to setCandidateMorphsMask().

Returns:currently selected candidate morphs represented as a tuple of bool objects
Return type:tuple
static create(tree_data=None, source=None, target=None, callback_class=<class 'molpher.core.ExplorationTree.Callback'>)[source]

create tree

Parameters:

Note

When tree_data is specified, source and target are always ignored.

fetchMol(canonSMILES)[source]

Returns a molecule from the tree using a canonical SMILES string.

Raises a RuntimeError if the molecule is not found.

Parameters:canonSMILES (str) – SMILES string of the molecule to fetch
Returns:molecule from a tree
Return type:MolpherMol
generation_count
Returns:Number of morph generations connected to the tree so far.
Return type:int
leaves
Returns:the current leaves of the tree
Return type:tuple of MolpherMol instances
params

A dictionary representing the current exploration parameters.

It is possible to assign a new dictionary (or an instance of the molpher.swig_wrappers.core.ExplorationData class) to update the current parameters.

Note

Only parameters defined in the supplied dictionary are changed and if an instance of molpher.swig_wrappers.core.ExplorationData is supplied only the parameters are read from it (the tree structure remains the same).

Returns:current parameters
Return type:dict
path_found
Returns:True if the target molecule is present in the tree, False otherwise.
Return type:bool
thread_count
Returns:maximum number of threads this instance will use
Return type:int
traverse(callback, start_mol=None)[source]

This method can be used to traverse the whole tree structure (or just a subtree) starting from the root to leaves. It takes a callback function that accepts a single required argument and traverses the whole tree starting from its root (or root of a specified subtree – see start_mol) and calls the supplied callback with with encountered morph as its parameter.

Warning

The tree traversal is implemented in C++ and the callback to Python is realized using SWIG’s director feature, which makes it possible to keep the implementation concurrent and efficient. However, there is a problem:

If a reference to the MolpherMol instance is saved into a variable that outlives the call to the callback function, this reference then becomes invalid when the call is finished. Therefore, doing something like this:

var = None
def callback(morph):
    if var:
        print("Previous:", var.smiles)
    var = morph
tree.traverse(callback)

will likely result in a segmentation fault upon a second call to the callback function, because the object referenced by var will no longer refer to valid memory. This is likely a result of SWIG freeing the pointer without taking the existing reference from Python into account.

One way to work around this is to just save the SMILES string of the molecule and then fetch the reference to it using the fetchMol() method.

Parameters:
  • callback (a callable object that takes a single argument) – the callback to call
  • start_mol (str or MolpherMol) – the root of a subtree to explore as canonical SMILES or MolpherMol instance

1.1.1.1.1.3. molpher.core.MolpherMol module

class molpher.core.MolpherMol.MolpherMol(smiles=None, other=None)[source]

Bases: molpher.swig_wrappers.core.MolpherMol

Parameters:
  • smiles (str) – smiles of the molecule that is to be created
  • other (molpher.swig_wrappers.core.MolpherMol or its derived class) – another instance, the new instance will be its copy (tree ownership is not transferred onto the copy)

This a specialized version of the molpher.swig_wrappers.core.MolpherMol proxy class. It implements some additional functionality for ease of use from Python.

copy()[source]

Returns a copy of this instance. If this instance has a tree assigned, the returned will have None instead.

Returns:a copy of this instance
Return type:
descendents

Canonical SMILES strings of all molecules derived from this compound that are currently present in the tree.

Returns:
Return type:str
dist_to_target

The value of the objective function. In the original implementation, this is the structural distance to the target molecule using a similarity measure.

This value can be changed.

Returns:value of the objective function
Return type:float
gens_without_improvement

Number of morph generations derived from this molecule that did not contain any morphs with an improvement in the objective function from the target molecule.

This value can be changed.

Returns:number of non-producing generations
Return type:int
historic_descendents

Canonical SMILES strings of all molecules derived from this compound.

Returns:
Return type:str
parent_operator

The name of the chemical operator selector that lead to the creation of this molecule.

Returns:name of the parent chemical operator
Return type:str
parent_smiles

Canonical SMILES string of the parent molecule in the tree.

Can be an empty str, if the molecule is a root of the tree or is not associated with any.

Returns:canonical SMILES string of the parent molecule in the tree
Return type:str
sascore

The synthetic feasibility score of the molecule according to Ertl.

This value can be changed.

Returns:synthetic feasibility score
Return type:float
smiles
Returns:canonical SMILES string of this molecule
Return type:str
tree

A reference to the tree this instance is currently in. If the molecule is not present in any tree, this value is None.

Returns:reference to the tree this instance is currently in
Return type:ExplorationTree or None

1.1.1.1.1.4. molpher.core.selectors module

Contains all global selectors that are usually used when creating an exploration tree or setting any of its parameters during runtime.

There are three types of selectors:

  1. fingerprints selectors

    Their names are prepended with ‘FP_’ and are used to either set the fingerprint member of the ExplorationData class or as the value of the fingerprint key when calling create() with the params parameter or writing into the params member of the ExplorationTree.

    FP_MORGAN is the default option.

  2. similarity coefficient selectors

    Their names are prepended with ‘SC_’ and are used to either set the similarity member of the ExplorationData class or as the value of the similarity key when calling create() with the params parameter or writing into the params member of the ExplorationTree.

    SC_TANIMOTO is the default option.

  3. chemical operators

    Their names are prepended with ‘OP_’ and an iterable of them is used to either set the operators member of the ExplorationData class or as items of the iterable assigned to operators key when calling create() with the params parameter or writing into the params member of the ExplorationTree.

    All of the available selectors are used by default.

molpher.core.selectors.FP_ATOM_PAIRS = 0
molpher.core.selectors.FP_EXT_ATOM_PAIRS = 7
molpher.core.selectors.FP_EXT_MORGAN = 8
molpher.core.selectors.FP_EXT_TOPOLOGICAL = 9
molpher.core.selectors.FP_EXT_TOPOLOGICAL_LAYERED_1 = 10
molpher.core.selectors.FP_EXT_TOPOLOGICAL_LAYERED_2 = 11
molpher.core.selectors.FP_EXT_TOPOLOGICAL_TORSION = 12
molpher.core.selectors.FP_MORGAN = 1

This is the default selector that is used when no other option is specified.

molpher.core.selectors.FP_TOPOLOGICAL = 2
molpher.core.selectors.FP_TOPOLOGICAL_LAYERED_1 = 3
molpher.core.selectors.FP_TOPOLOGICAL_LAYERED_2 = 4
molpher.core.selectors.FP_TOPOLOGICAL_TORSION = 6
molpher.core.selectors.FP_VECTORFP = 5
molpher.core.selectors.OP_ADD_ATOM = 0
molpher.core.selectors.OP_ADD_BOND = 2
molpher.core.selectors.OP_BOND_CONTRACTION = 7
molpher.core.selectors.OP_BOND_REROUTE = 6
molpher.core.selectors.OP_INTERLAY_ATOM = 5
molpher.core.selectors.OP_MUTATE_ATOM = 4
molpher.core.selectors.OP_REMOVE_ATOM = 1
molpher.core.selectors.OP_REMOVE_BOND = 3
molpher.core.selectors.SC_ALL_BIT = 0
molpher.core.selectors.SC_ASYMMETRIC = 1
molpher.core.selectors.SC_BRAUN_BLANQUET = 2
molpher.core.selectors.SC_COSINE = 3
molpher.core.selectors.SC_DICE = 4
molpher.core.selectors.SC_KULCZYNSKI = 5
molpher.core.selectors.SC_MC_CONNAUGHEY = 6
molpher.core.selectors.SC_ON_BIT = 7
molpher.core.selectors.SC_RUSSEL = 8
molpher.core.selectors.SC_SOKAL = 9
molpher.core.selectors.SC_TANIMOTO = 10

This is the default selector that is used when no other option is specified.

molpher.core.selectors.SC_TVERSKY_SUBSTRUCTURE = 11
molpher.core.selectors.SC_TVERSKY_SUPERSTRUCTURE = 12