Macs in Chemistry

Insanely great science


Multi SMILES to Chemdraw files

I recently wrote a script in response to a tweet.

A recent tweet got me interested.


You could of course just and paste as SMILES but this would be very tedious, this is an ideal task for Applescript. The first part of the script simply defines the location of the folder where the resulting ChemDraw files will be stored. In this case in the home folder. We then allow the user to choose the file containing the SMILES strings and read them in, and calculate how many lines there are. The next part loops through each line and copies the SMILES string to the clipboard, we then create a unique file name for each structure. Then scripting ChemDraw menus, to paste as smiles, and then save as ChemDraw with the unique filename, finally clearing the document to allow pasting of the next SMILES string.

A reader wrote in to ask if it might be possible to modify the script to use the identifier in the file containing the SMILES string as shown below.

Ic1ccccc1   ID_1
CC=O    ID_2
CC(O)=O     ID_3
CC(OC(C)=O)=O   ID_4
CC(C)=O     ID_5
CC#N    ID_6
CC(c1ccccc1)=O  ID_7
CC(Br)=O    ID_8
CC(Cl)=O    ID_9

In this case the the SMILES string is followed by the identifier in a tab delimited text file.

A minor modification to the script allows this, instead of simply using mySMILES as the input SMILES we take each line and parse the line into separate items using "tab" as the delimiter.

set mySMILES to item i of smiList as text
--display dialog mySMILES

set text item delimiters to tab -- prepare to parse line using tab
set theTextItems to text items of mySMILES -- parse line into fields
set AppleScript's text item delimiters to tid -- restore text delimiters

set theSMILES to item 1 of theTextItems
set theName to item 2 of theTextItems

We then use theSMILES as the input SMILES string and theName as the filename

set the clipboard to {text:(theSMILES as string), Unicode text:theSMILES}

set file_name to theName & ".cdx"
set my_file to target_folder & file_name

The end result is a folder of ChemDraw structures each named appropriately.


You can download the script together with a couple of test files here

Last Updated 6 July 2019