Molpher-lib

New features in the 0.0.0b2dev0 development snapshot.

Initialization of a molecule using a string representation (SMILES and SDF file path are two recognized types at the moment):

For convenience, it is possible to readily convert the MolpherMol molecule to an RDKit Mol:

We can also convert the RDKit molecule back to a MolpherMol instance using the other parameter of the MolpherMol class:

The constructor will automatically recognize an rdkit Mol instance and do all the necessary steps to convert it. We can also just supply another MolpherMol instead:

The code will again do the introspection and perform the necessary steps.

This behaviour will probably change a little in the future. Static factory methods will be more sensible for the task since there is a plethora of different representations that can be used to initialize molecules.

Atom locking

The structure of cymene that we loaded through an SDF file has some of its atoms locked for certain modifications:

The locked atoms are indicated in the SDF as molecule properties. Each property name starts with a prefix MOLPHER_ continued by a 'lock name' which indicates what kind of locking mechanism should be employed. The value of the property is a list of atom numbers that indicates what atoms should be locked in the specified manner. In our example, we decided to forbid addition of new atoms to atoms 2,5,6,7,8 and 9 and we also disabled their removal. Atom number 10 should also not be removed from the molecule during morphing.

It is possible to easily get a hold of the locked atoms in the molecule using a function like this:

The list contains tuples where the first member is the index of the atom in the molecule and the second is an instance of MolpherAtom, a class used to represent atoms in the Molpher-lib library. MolpherAtom instances can be used to query lock information and other atom-related data. If we convert the molecule to an RDKit instance, the atom locks are saved as properties of the molecule:

I think that atoms in RDKit can have properties as well so in the future, the locking information will probably be written directly to the atoms as well as to the molecule as a whole.

So far the actual position of the locked atoms in the molecule has been rather hidden from us so let's take a look now. The following function displays the structure and highlights the locked atoms for us:

This works because the order of atoms is the same in the generated RDKit molecule and the original MolpherMol instance.

In order to get an overview of different locks placed on each atom, we can use the lock_info attribute of MolpherAtom:

The different locks are available as constant members of the MolpherAtom class and can be easily combined with bit operators. The following code example adds a new lock to atom #2 (index 1):

All locks are saved in the locking_mask property. You can always transform this property to a list of lock names using the lockingMaskToString static method:

or you can look at the lock_info again:

You can see that the KEEP_NEIGHBORS and KEEP_BONDS locks were also automatically added to the atom since the lock we set is basically a combination of the two. KEEP_NEIGHBORS will ensure that the neighboring atoms will not be changed or removed while KEEP_BONDS will prevent any changes to the bonds that this atom contributes to.

Let's now redefine the show_locked_atoms function above so that we can display atoms that have specific locks on them:

This function will by default again simply highlight atoms that have any locks in place since the FULL_LOCK lock combines all locks into one:

If we want to highlight specific locks, we can easily do so by supplying the associated value:

Let's now use a morphing operator to generate a random morph of this molecule:

In short, the AddAtom operator basically selects a random atom in the molecule (as allowed by the placed locks) and then adds a random atom to it.

Therefore, we get a different result each time we run the morph method:

We can easily display the created morphs like so:

You can see that the code respects the constraints we specified and that no atoms are added to the aromatic ring system. Therefore, we essentially explored a part of chemical space around a certain scaffold.

We could also generate a simple random path in chemical space with this operator:

Now these molecules are obviously rather crazy since the probability of adding an atom of certain type is the same for all of them at the moment, but there will be a mechanism to affect the ratio in which the added types are selected in the future.

If we were only interested in adding carbon or just a handful of types, we could define our own atom library and make the AddAtom operator use it:

The AddAtom class will have more features in the future to customize the atoms that can be added and in what ratio. This is just a simple example to show how morphing could work using a single operator.

It should also be noted that the placed atom locks are preserved during morphing. Let's modify our display function to also highlight locking information:

Let's now showcase the Molpher class which brings all operators together:

Using all operators obviously can break the aromatic core sometimes since we did not fully protect it against bond breaking and other modifications, but we could easily achieve that by introducing suitable locks, such as KEEP_NEIGHBORS and/or KEEP_BONDS.

The attempts parameter in the molpher class constructor specifies how many morphing attempts the class should perform. Since some operators might not be possible on the given molecules, the number of returned morphs can be smaller than the number of attempts. You can see information about the failed attempts in the error log if there were any.

The class also doesn't keep track of already generated molecules so duplicates can occur. This is the task of the ExplorationTree class which is part of more high level morphing interface and was already described in previous versions.

The Molpher class runs in multithreaded mode and will try to use all available cores on the system. You can change this behaviour with the threads constructor parameter (you might need to run it with more attempts and more complex molecules to really see any noticable difference).

It is also possible to use Molpher instances as infinite morph generators:

Custom Morphing Operators

Because all operator classes are required to implement the MorphingOperator interface, making a customized operator is possible by just defining a class tha implements the required methods (setOriginal and morph):

The IntroduceNitrogen class in this example is rather simple and basically replaces a random carbon atom in the original structure with a nitrogen. But we could definitely imgine more complex logic here that also takes atom locking into account and other things.

Finally, let's plug this class into a slightly modified version of the morphing code we described above and see if it works:

And indeed apart from other expected products of the transformation we also got a few molecules that have a carbon atom replaced for nitrogen. It should be noted that in this case we could also achieve a similar result with just using the MutateAtom operator with a custom atom library. Also note that the structures generated with out operator do not preserver atom locks at the moment, which is something we omitted here for the sake of simplicity.

Summary