Macs in Chemistry

Insanely great science

 

Scripting the Chemical Identifier Resolver

The name to structure feature in ChemBioDraw is very useful but is pretty much limited to systematic names and certainly does not support other chemical identifiers like CAS Numbers. There are a number of online services that do support these sort of functions but you end up having to cut and paste from different web sites. This is where the Chemical Identifier Resolver comes into play.

About the Chemical Identifier Resolver

The Chemical Identifier Resolver (CIR) by the CADD Group at the NCI/NIH is a web service that performs various chemical name to structure conversions. The service works as a resolver for different chemical structure identifiers and allows one to convert a given structure identifier into another representation or structure identifier. It can help you identify and find the chemical structure if you have an identifier such as an InChIKey or CAS Number. You can either use the resolver web form at the web link above or use the following simple URL.

http://cactus.nci.nih.gov/chemical/structure/"structure identifier"/"representation"

Example: Chemical name to SMILES: http://cactus.nci.nih.gov/chemical/structure/aspirin/smiles

The input identifier can be a chemical name, SMILES, CAS Number, InChi etc and the returned representation can be SMILES, sdf, png etc. this is achieved by using a combination of OPSIN, ChemSpider and CIR's own database.

The Chemical Identifier Resolver script

The first part of the script captures the text contents of the clipboard and then creates a dialog box with the text box populated with the clipboard contents. The user can of course type or paste alternative text into the box.

cir_dialog

One issue is that the text strings can contain a number of special characters such as "#" or "%", these will break the URL. The standard practice when creating URL's is to encode special characters (high-level ASCII) and spaces to their hexidecimal equivalents. For example, spaces in URL's are converted to: %20. The next part of the script uses sub-routines for encoding high-ASCII characters, and returns the encoded SMILES string. The desired URL is then created and curl (a command line tool for transferring data with URL syntax) used to request the SMILES string, which is then copied to the clipboard.

The final part of the script causes ChemBioDraw to activate and then uses “Paste Special” to use the SMILES string to create a structure.

UPDATE!!

I was reminded that whilst scripting menu items was the traditional way of controlling ChemDraw the more recent releases allow control by scripting commands. This is a major advance since menus can change or be translated into other languages.

Thus:

if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"

now becomes

if enabled of command "pasteAsSMILES" then do command "pasteAsSMILES"

The updated script is available from here CIRStructureCD.scpt.zip

It needs to be stored in:-

/Users/username/Library/Application Support/CambridgeSoft CS ChemOffice 2010/ChemDraw

See also the corresponding script for Marvin Sketch.

If you are interested in a Python interface read this.

The Script

--Created by Macs in Chemistry (http://www.macinchem.org)

set the clipboard to «class ktxt» of ((the clipboard as text) as record)

set the_clip to the clipboard
--Comment out if not needed
--display dialog the_clip


display dialog "Input Name" default answer the_clip buttons {"Cancel", "Get Structure"} default button 2
copy the result as list to {text_returned, button_pressed}
--Need to encode text to sure URL is OK
set the_encode_text to encode_text(text_returned, true, false)


--set theURL to "http://cactus.nci.nih.gov/chemical/structure/name/smiles"

set theURL to "http://cactus.nci.nih.gov/chemical/structure/" & the_encode_text & "/smiles"


set the_SMILES to (do shell script "curl -L  " & theURL)


set the clipboard to the_SMILES


tell application "CS ChemBioDraw Ultra"
  
    activate
    --if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"

    if enabled of command "pasteAsSMILES" then do command "pasteAsSMILES"



end tell


on encode_text(this_text, encode_URL_A, encode_URL_B)
  set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
    set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
    set the URL_B_chars to ".-_:"
    set the acceptable_characters to the standard_characters
    if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
    if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
    set the encoded_text to ""
    repeat with this_char in this_text
      if this_char is in the acceptable_characters then
          set the encoded_text to (the encoded_text & this_char)
      else
          set the encoded_text to (the encoded_text & encode_char(this_char)) as string
      end if
  end repeat
    return the encoded_text
end encode_text

on encode_char(this_char)
  set the ASCII_num to (the ASCII number this_char)
    set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
    set x to item ((ASCII_num div 16) + 1) of the hex_list
    set y to item ((ASCII_num mod 16) + 1) of the hex_list
    return ("%" & x & y) as string
end encode_char