Macs in Chemistry

Insanely great science

 

Searching emolecules

I suspect that everyone has their own preferred online chemical supplier, I use eMolecules much of the time. However I'd prefer not to have to learn a new chemical drawing package for every online database. Fortunately most of the sites will accept a SMILES string as the query and most of the desktop drawing packages can generate a SMILES string.

The following Applescript uses ChemBioDraw (aka ChemDraw) to generate the SMILES string of the selected structure, if no structure is selected the script performs a "Select all". One issue is that SMILES strings can contain a number of special characters such as "#" or "%", these will break the URL.

The standard practice when creating URL's is to encode special characters (high-level ASCII) and spaces to their hexidecimal equivalents. For example, spaces in URL's are converted to: %20. You can read more about it here. The next part of the script uses sub-routines for encoding high-ASCII characters, and returns the encoded SMILES string.

Whilst eMolecules offers many options for modifying the search I've inlcuded just two, Exact search or a substructure search. The next step constructs the URL's using the encoded SMILES strings. The dialog then gives the user the option of which query to run. The script can be downloaded from here. It needs to be installed in "/Applications/CS ChemOffice 2008/CS ChemDraw/ChemDraw Items" folder, and it should then appear in the "Scripts" menu when you next restart ChemBioDraw.


--http://www.emolecules.com/cgi-bin/search?t=ss&v=&cg=&molfile=&bb_prices=&q=IC1=CC=CC(N)=C1  substructure
--http://www.emolecules.com/cgi-bin/search?t=ex&v=&cg=&molfile=&bb_prices=&q=IC1=CC=CC(N)=C1  Exact


tell application "CS ChemBioDraw Ultra"
  
    activate
    if not (enabled of menu item "Copy") then
      do menu item "Select All" of menu "Edit"

        set theSMILES to SMILES of selection
  else
      
        set theSMILES to SMILES of selection
  end if



end tell

set the_encode_SMILES to encode_text(theSMILES, true, false)
--display dialog the_encode_SMILES

set emolecules_EX_URL to "http://www.emolecules.com/cgi-bin/search?t=ex&v=&cg=&molfile=&bb_prices=&q=" & the_encode_SMILES

set emolecules_SS_URL to "http://www.emolecules.com/cgi-bin/search?t=ss&v=&cg=&molfile=&bb_prices=&q=" & the_encode_SMILES

display dialog "Choose Search" buttons {"Exact", "SubStructure", "Cancel"} default button 3
set the button_pressed to the button returned of the result
if the button_pressed is "Exact" then
  --to open in default web browser

    open location emolecules_EX_URL

else if the button_pressed is "SubStructure" then
  open location emolecules_SS_URL

else if the button_pressed is "Cancel" then
end if

on encode_text(this_text, encode_URL_A, encode_URL_B)
  set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
    set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
    set the URL_B_chars to ".-_:"
    set the acceptable_characters to the standard_characters
    if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
    if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
    set the encoded_text to ""
    repeat with this_char in this_text
      if this_char is in the acceptable_characters then
          set the encoded_text to (the encoded_text & this_char)
      else
          set the encoded_text to (the encoded_text & encode_char(this_char)) as string
      end if
  end repeat
    return the encoded_text
end encode_text

on encode_char(this_char)
  set the ASCII_num to (the ASCII number this_char)
    set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
    set x to item ((ASCII_num div 16) + 1) of the hex_list
    set y to item ((ASCII_num mod 16) + 1) of the hex_list
    return ("%" & x & y) as string
end encode_char