Macs in Chemistry

Insanely great science

Applescript Tutorial 8

Rendering chemical structures embedded in graphics file

Rich Apodaca has been discussing embedding molecular information in images of molecules, such as a PNG file depicting a 2D structure. As we move to a more web-centric view of the world it is apparent that much of research information will be only available via the web, whilst images of chemical structures are usually adequate for a human viewer the chemical structure cannot be indexed and subsequently searched. In a subsequent article Rich showed a method of extracting the information as text. In this tutorial I'm going to show how to use applescript to extract the information from the PNG file and then display the structure in a couple of chemical display packages in an editable form.

This script will require a couple of things, ChemBioDraw (aka ChemDraw), MacPymol (http://pymol.sourceforge.net/), and the excellent ExifTool by Phil Harvey (http://www.sno.phy.queensu.ca/~phil/exiftool/). ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in image, audio and video files. You will also need a couple of image files.

This file is Lipitor generated by Geoff Hutchison this contains the chemical information embedded as both SMILES and molfile format.

lipitor1

The second is rosiglitazone from Rich Apodaca which has the chemical information embedded in molfile format only. You can drag these images to your desktop to work with.

rosiglitazone

The first part of the script simply asks the user to choose the image file, and than creates the POSIX path to the file since this is needed by ExifTool. The next part creates a three button dialog box allowing the user to choose the application to view the resulting structure.

The main part of the script then uses ExifTool to extract the metadata, in the case of ChemDraw we can generate the structure from either the SMILES string or molfile data. As written the script first extracts the SMILES and then checks to see if a string has been extracted. If there is no SMILES it then gets the molfile information. If using the molfile we need to save the data to a file using the write to file routine (actually saved into temporary items folder) called temp.mol. This is then opened using ChemBioDraw. If the SMILES data is present then we can simply use menu item scripting within ChemBioDraw to create the structure using the "Paste Special" option "SMILES". The metadata sometimes contains tabs so use the find and replace routine to remove them.

As an alternative MacPymol can be used to display the structures, since MacPymol cannot convert SMILES to structures (this will be possible using the next version of OpenBabel which will be released early 2008), we can only use the molfile info. If we now run the script choosing the rosiglitazone.png file, then selecting ChemBioDraw for display you should get this result.

rosiglitazone_CD

Using lipitor1.png and MacPymol you should see this.

lipitor_Macpymol

You can download a copy of the script here.

set theMetadata to ""

set theFile to (choose file with prompt "Choose a image file")
set the_path to theFile as string
--display dialog the_path
set posix_path to POSIX path of the_path


display dialog "How would you like to display the structure? " buttons {"Cancel", "ChemDraw", "MacPyMol"} default button 1
if the button returned of the result is "ChemDraw" then
   --set theScript to "exiftool -SMILES -b  /Users/username/Desktop/lipitor1.png"

    set theScript to "exiftool -SMILES -b  " & posix_path

    set theMetadata to (do shell script theScript)

    if theMetadata is "" then
       set theScript to "exiftool -molfile -b  " & posix_path

        --use as text to remove non-printing characters
        set theMetadata to (do shell script theScript) as text


        set target_file to (path to temporary items folder as string) & "temp.mol"
        write_to_file(theMetadata, target_file, false)

        tell application "CS ChemBioDraw Ultra"
           activate
            open file target_file
       end tell
   else
       
        set this_text to (replace_chars(theMetadata, tab, ""))
        set the clipboard to this_text as text

        tell application "CS ChemBioDraw Ultra"
           activate

            if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"

       end tell

   end if
else if the button returned of the result is "MacPymol" then
   
    set theScript to "exiftool -molfile -b  " & posix_path

    --use as text to remove non-printing characters
    set theMetadata to (do shell script theScript) as text


    set target_file to (path to temporary items folder as string) & "temp.mol"
    write_to_file(theMetadata, target_file, false)

    tell application "MacPyMOL"
       activate
        open file target_file
   end tell
else
   quit
end if

--Routines
on replace_chars(this_text, search_string, replacement_string)
   set AppleScript's text item delimiters to the search_string
    set the item_list to every text item of this_text
    set AppleScript's text item delimiters to the replacement_string
    set this_text to the item_list as string
    set AppleScript's text item delimiters to ""
    return this_text
end replace_chars

on write_to_file(this_data, target_file, append_data)
   try
       set the target_file to the target_file as text
        set the open_target_file to open for access file target_file with write permission
        if append_data is false then set eof of the open_target_file to 0
        write this_data to the open_target_file starting at eof
        close access the open_target_file
        return true
   on error
       try
           close access file target_file
       end try
        return false
   end try
end write_to_file