Macs in Chemistry

Insanely great science

 

Search un1Chem

Un1Chem is a new web resource provided by the EBI, it is a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between databases. Currently the uniChem contains data from 19 different databases:-

ChEMBL DrugBank PDBe (Protein Data Bank Europe) International Union of Basic and Clinical Pharmacology PubChem ('Drugs of the Future' subset) KEGG (Kyoto Encyclopedia of Genes and Genomes) Ligand ChEBI (Chemical Entities of Biological Interest). NIH Clinical Collection ZINC eMolecules IBM strategic IP insight platform and the National Institutes of Health Gene Expression Atlas IBM strategic IP insight platform and the National Institutes of Health. FDA/USP Substance Registration System (SRS) SureChem PharmGKB Human Metabolome Database (HMDB) Selleck PubChem ('Thomson Pharma' subset)

UniChem's primary function is to maintain cross references between EBI chemistry resources. These include primary chemistry resources (ChEMBL, ChEBI and PDBeChem), and other resources where the main focus is not small molecules, but which may nevertheless contain some small molecule information (eg: Gene Expression Atlas).

Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI.

Searching uses either the source compound Id, InChI or InChI Key and has the following format

https://www.ebi.ac.uk/unichem/frontpage/results?queryText=SMWDFEZZVXVKRB-UHFFFAOYSA-N&kind=InChIKey&sources=&incl=exclude

Since ChemBioDraw can generate InChi Keys I thought it might be interesting to write an applescript that access this service. The InChIKey is a short, fixed-length character signature based on a hash code of the InChI string. By definition, hashing is a one-way conversion procedure and the original structure cannot be restored from the InChiKey allowing confidential searching.

The Script

The script is shown below, the first part scripts the menu and sub menus of ChemBioDraw to get the InChI Key, we then construct the URL and then open in the default wenb browser.

CD<em>unichem</em>search.scpt


--https://www.ebi.ac.uk/unichem/frontpage/results?queryText=SMWDFEZZVXVKRB-UHFFFAOYSA-N&kind=InChIKey&sources=&incl=exclude



tell application "CS ChemBioDraw Ultra"
  
    activate
    if not (enabled of menu item "Copy") then
      do menu item "Select All" of menu "Edit"

        if enabled of menu item "Copy" then do menu item "InChI Key" of menu "Copy As" of menu "Edit"

        set theInchikey to the clipboard


  else
      
        if enabled of menu item "Copy" then do menu item "InChI Key" of menu "Copy As" of menu "Edit"

        set theInchikey to the clipboard
  end if

    --display dialog theInchikey

end tell

set the_encode_theInchikey to encode_text(theInchikey, true, false)
--display dialog the_encode_SMILES

set unichem_url to "https://www.ebi.ac.uk/unichem/frontpage/results?queryText=" & the_encode_theInchikey & "&kind=InChIKey&sources=&incl=exclude"


display dialog "Search uni1Chem" buttons {"Search", "Cancel"} default button 1
set the button_pressed to the button returned of the result
if the button_pressed is "Search" then
  --to open in default web browser

    open location unichem_url


else if the button_pressed is "Cancel" then
end if

on encode_text(this_text, encode_URL_A, encode_URL_B)
  set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
    set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
    set the URL_B_chars to ".-_:"
    set the acceptable_characters to the standard_characters
    if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
    if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
    set the encoded_text to ""
    repeat with this_char in this_text
      if this_char is in the acceptable_characters then
          set the encoded_text to (the encoded_text & this_char)
      else
          set the encoded_text to (the encoded_text & encode_char(this_char)) as string
      end if
  end repeat
    return the encoded_text
end encode_text

on encode_char(this_char)
  set the ASCII_num to (the ASCII number this_char)
    set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
    set x to item ((ASCII_num div 16) + 1) of the hex_list
    set y to item ((ASCII_num mod 16) + 1) of the hex_list
    return ("%" & x & y) as string
end encode_char

The script can be downloaded from here CDunichemsearch.scpt.zip

The Applescript section contains more tutorials, scripts and resources.