When working with multiple data sets of molecules, particularly if combining them from multiple sources, one of the most common tasks is removal of duplicates. This can be a time-consuming and error prone process if carried out manually and this script should hopefully make this a much easier task.
The script uses InChiKeys to compare for potential duplicate structures, the 27 character standard InChIKey is a hashed version of the full standard InChI, and was designed to allow for easy searches of chemical compounds. Standard InChi strings for large molecules may contain >1000 characters. For more details on the InChiKey read J Cheminform. 2013; 5: 7 DOI.
NOTE ! There is a bug in some versions of Vortex such that the InChiKey is not generated, this was fixed it in versions 42289 and later.
The first part of the script creates two new columns, one for the InChiKey and the other for the duplicate flag, then we calculate the InChiKey and populate the table.
The duplicate searching is achieved by taking each individual InChiKey and searching through the table flagging those that are duplicates.
The Vortex Script
# A script to flag duplicate structures # # Vortex imports import com.dotmatics.vortex.util.Util as Util import com.dotmatics.vortex.mol2img.jni.genImage as genImage import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img import jarray import binascii import string import os import sys InChIKeyColumn = vtable.findColumnWithName("InChIKey",1) DupColumn = vtable.findColumnWithName("DupFlag",1) rows = vtable.getRealRowCount() for r in range(0, int(rows)): mol = vtable.molFileManager.getMolFileAtRow(r) inChIKey = vortex.getMolProperty(mol, 'InChIKey') InChIKeyColumn.setValueFromString(r, inChIKey) vtable.fireTableStructureChanged() rows = vtable.getRealRowCount() for r in range(0, int(rows)): SearchInchi = InChIKeyColumn.getValue(r) n=0 for t in range(0, int(rows)): indInchi = InChIKeyColumn.getValue(t) if SearchInchi == indInchi: n = n+1 if n >1: DupFlagVal = "Duplicate" DupColumn.setValueFromString(r, DupFlagVal) vtable.fireTableStructureChanged
The script can be downloaded from here
Page Updated 16 July 2015