Macs in Chemistry

Insanely great science

Applescript Tutorial 3

Reading, Writing and using Lists

The following Applescript uses Chemdraw to calculate to calculate a variety of molecular properties and then stores them as individual values. These can then be used as demonstrated rather trivially by the display dialog command. The script can be downloaded here.

tell application "CS ChemDraw Ultra"
   
    set the_SMILES to SMILES of selection
    set Elem_Anal to Elemental Analysis of selection
    set Exact_mass to Exact Mass of selection
    set Mol_Form to Molecular Formula of selection
    set Mol_weight to Molecular Weight of selection


    set Chem_props to "SMILES " & the_SMILES & return & "Chem Analysis " & Elem_Anal & return & "Molecular Formula " & Mol_Form & return & "Molecular Weight " & Mol_Form & return & "Molecular Weight " & Mol_weight



    display dialog Chem_props
end tell

This is fine if all you have to do is calculate the properties for a single molecule but what if you want to perform the calculation of a list of structures.

You can download the file here temp_mac.txt control click on the link and choose "Save link ....". What we need to do now is have the user choose a file, read the contents and then store the data in a list. Lists are just a group of values stuck between {} for example {1,2,3} or {1,"b","hello",{1,3,5}}. As you can see you can mix types, and even have a list within a list. So in the script below we first define the list we will read the molecules into, then get the user to choose a file, read the contents of the file into theData.

set mol_list to {}
set theData to ""

set theFile to (choose file with prompt "Select the file:" of type {"TEXT"}) as alias

open for access theFile

set theData to read theFile using delimiter return

close access 

If you copy and paste the above text into Script Editor, compile select "Event Log" and click "Run" you can choose the temp_mac.txt file and you should see a result as shown below. Each of the lines is read as a value into the list:-

{"c1ccccc1 benzene", "Ic1ccccc1 iodobenzene", "O=C1CCCCC1 cyclohexanone", "NC1CCCCC1 cyclohexamine", "CN(C)c1cccnc1 3-dimethylaminopyridine", "N1(c2ccccc2)CCNCC1 phenylpiperazine"}

read_file

Having read the file we will of course want to write out the results at some point so this seems a good time to think about the the file we will be saving to. We do this with the help of a simple sub-routine, we want to save the results in the same folder as the file we read in. We pass "theFile" to the sub-routine which returns the folder in which it resides. It is a simple task to append the output file name.

set the_file_path to GetParentPath(theFile)

set theSaveFile to the_file_path & "test2.smi"

on GetParentPath(theFile)
   tell application "Finder" to return container of theFile as text
end GetParentPath

So now we have all the data into a list we can begin to manipulate it, first we need to get the SMILES strings. At the moment the first item in the list is "c1ccccc1 benzene" we need to separate the two terms. First change the text delimiter to "tab" then a simple repeat loop selects each item in theData and copies it to the end of a new list called "mol_list". Remember to change the delimiter back!

set text item delimiters to tab
repeat with i from 1 to count of theData
   set theLine to text items of item i of theData
    copy theLine to the end of mol_list
end repeat
set text item delimiters to ""

The result is a list of lists:

The result is a list of lists:-
{{"c1ccccc1", "benzene"}, {"Ic1ccccc1", "iodobenzene"}, {"O=C1CCCCC1", "cyclohexanone"}, {"NC1CCCCC1", "cyclohexamine"}, {"CN(C)c1cccnc1", "3-dimethylaminopyridine"}, {"N1(c2ccccc2)CCNCC1", "phenylpiperazine"}}

We can select both the "SMILES" and "name" of each item of "mol_list" and use "ChemDraw to calculate the properties.

set the_compound to item i of mol_list
set the_SMILES to item 1 of the_compound
set the_name to item 2 of the_compound
--display dialog the_SMILES
--display dialog the_name
set the clipboard to the_SMILES

However getting ChemDraw to create the chemical structure from the SMILES string is not straight-forward, there is not a "Paste SMILES" command in the Applescript dictionary. So we script the menus to paste the SMILES. The rest of the ChemDraw commands you have seen before. We then combine all the different data items for a single compound into a list "molpropslist" and then add them to the end of "allmollist"

tell application "CS ChemDraw Ultra"
   
    activate

    if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"

    set the_CD_SMILES to SMILES of selection
    set Elem_Anal to Elemental Analysis of selection
    set Exact_mass to Exact Mass of selection
    set Mol_Form to Molecular Formula of selection
    set Mol_weight to Molecular Weight of selection


    copy the_SMILES to the end of mol_props_list
    copy the_name to the end of mol_props_list
    copy the_CD_SMILES to the end of mol_props_list
    copy Elem_Anal to the end of mol_props_list
    copy Exact_mass to the end of mol_props_list
    copy Mol_Form to the end of mol_props_list
    copy Mol_weight to the end of mol_props_list

    if enabled of menu item "Paste" then do menu item "Clear" of menu "Edit"
    --display dialog (item 3 of mol_props_list)
end tell
copy mol_props_list to the end of all_mols_list

It only remains to convert the list to tab delimited text and then save the result. The repeat loop does the conversion and the sub-routine adds each line to the file. It is probably worth mentioning that having regularly used snippets of code as sub-routines certainly helps the cut and paste school of programming!

repeat with i from 1 to num_compounds
   set mol_list to item i of all_mols_list
    -- convert list to text
    set old_delim to AppleScript's text item delimiters
    set AppleScript's text item delimiters to tab
    set mol_list to mol_list as text
    --set mol_list to mol_list & "\n"  needs UNIX line endings
    set mol_list to mol_list & "
"
   set AppleScript's text item delimiters to old_delim
    my write_to_file(mol_list, theSaveFile, true)
end repeat



on write_to_file(this_data, target_file, append_data)
   try
       set the target_file to the target_file as text
        set the open_target_file to ÅN
           open for access file target_file with write permission
       if append_data is false then ÅN
           set eof of the open_target_file to 0
       write this_data to the open_target_file starting at eof
        close access the open_target_file
        return true
   on error
       try
           close access file target_file
       end try
        return false
   end try
end write_to_file

The complete script is available here ChemPropsMac.scpt.

UNIX rears its head again

The problem is SMILES often arrive as UNIX files, and there are two different line ending conventions in Mac OS X: Mac-style (lines end with return: "\r" or ASCII character 13) and Unix-style (lines end with line-feed: "\n" or ASCII character 10), so if we try to read a Unix file available here temp_unix.txt we have a problem. As you can see the entire text has been read in as a single value.

read_unix

The next tutorial will deal with this type of issue