Macs in Chemistry

Insanely great science


A Functional Group Count Script

I recently wrote a review of Reaction Workflows, a web-based tool that allow users to build workflows from nodes that provide inputs and outputs or perform actions, including ones to perform reaction-, scaffold-, and transform-based enumeration, and it is all done within a web browser interface using drag and drop. Whilst you can draw input structures one of the real strengths is the ability to import pre-categorised reagent files e.g.Acid Chlorides or secondary amines. Whilst Workflows comes with a set of pre-categorised reagents I'm sure most users will want to include their own proprietary or catalogues of commercial reagents.

This script is intended to help with the categorisation, it uses SMARTS strings to define queries. If you are not familiar with SMARTS then the Daylight Theory pages are a good starting place. I also find the SMARTSviewer at the Univ of Hamburg really helpful. There is a pascal script Checkmol that does somethings similar.

SMARTS is a language that allows you to specify substructures using rules that are straightforward extensions of SMILES. For example, to search a database for phenol-containing structures, one would use the SMARTS string [OH]c1ccccc1, which should be familiar to those acquainted with SMILES.

The script is a variation of the high performance sub-structure search scripts described previously, however instead of simply flagging the presence (or absence) of a SMARTS query we provide a count of the number of times a SMARTS query is identified within a molecule. The script uses all available cores and is thus capable of running multiple queries in parallel and can thus handle very large datasets. The script currently contains around 70 different SMARTS queries for both functional groups and atom counts and I'd be happy to add any suggestions.

The result is shown in the screenshot below


The Vortex Script

import java
from com.dotmatics.vortex.mol2img import Mol2Img

from Queue import Queue
from threading import Thread

processorcount = java.lang.Runtime.getRuntime().availableProcessors()

class smilesworker(Thread):
    def __init__(self, q, eval_column):
        self.q = q
        self.eval_column = eval_column

    def run(self):
        while 1:
            row = self.q.get()
            if row == None:
                vortex_tmp_value = vortex.getMolProperty(vtable.getStructureText(row), "SMILES")
                vortex_tmp_value = None
            if (vortex_tmp_value == None):
                self.eval_column.setValueFromString(row, None)
                self.eval_column.setValueFromString(row, str(vortex_tmp_value))

#Patterns here
patterns = [
#Carbon functional groups
('aro', '[a]'),
('acetylene', 'C#[CH1]'),
('carbonyl', '[CX3]=[OX1]'),
#urea will count as 2 amides
('amide', '[OX1]=CN'),
#old school HBA/D model (Count N,O and N or O bearing H)
('HBA', '[#7,#8]'),
('HBD', '[OX2H,NX3H,NX2H]')

class match_multiple(ProgressRunnable):
    def __init__(self):
        self.useMatchCount = 0
        self.calcSMILES = False
        self.nostructure = False
            self.structureColumn = vtable.findColumnWithName("SMILES")
        if self.structureColumn == None:
            self.calcSMILES = True

        if (self.calcSMILES == True ) & (vtable.findColumnWithName(vtable.MolfileColumn) == None):
            vortex.alert("You need an SD file or a SMILES column")
            self.nostructure = True

    def doCalcSmiles(self):
        self.structureColumn.setValueFromString(vtable.getRealRowCount() - 1, None)
        q = Queue(processorcount * 20)
        #The workers
        t = []
        #Create workers
        for i in range(0, processorcount):
            t.append(smilesworker(q, self.structureColumn))

        #Start the workers
        for i in range(0, processorcount):

        #Load the Q
        for row in range(0, vtable.getRealRowCount()):

        #Something to sell the workers to stop
        for i in range(0, processorcount):

        for i in range(processorcount):

    def updateProgress(self, perc, message):

    def run(self):
        if not self.nostructure:
            self.updateProgress(0, 'Calculating SMILES')
            if (self.calcSMILES):
                self.structureColumn = vtable.findColumnWithName("SMILES", 1, vortex.STRING)
            self.updateProgress(0, 'Indexing SMILES (for performance)')
            Mol2Img.doSearch(self.structureColumn, '[U].Cl.F.Br.N.O.S', 'nomdl', 1)

            results = []
            for i in range(0, vtable.getRealRowCount()):
            message = ''
            ttotal = 0
            for i in range(0, len(patterns)):
                self.updateProgress(int(100 * (float(i) / float(len(patterns)))), patterns[i][0])
                hits = Mol2Img.doSearch(self.structureColumn, patterns[i][1], 'nomdl', 1)
                mycol = vtable.findColumnWithName(patterns[i][0] + "_count", 1, vortex.INT)
                for i in range(vtable.getRealRowCount()):
                    if hits.containsKey(i):
                        mycol.setInt(i, hits[i])
                        mycol.setInt(i, 0)

if vws is None:
    vortex.alert("You must have a workspace loaded...")
    matcher = match_multiple(), "Generating matches")

The script can be downloaded from here

Last Updated 21 June 2017