Macs in Chemistry

Insanely great science

 

Generating a Rule of 7 Profile

I regularly have to profile collections of compounds, these may be from commercial suppliers or could be a virtual library created for a MedChem project. Whilst many people have there own ideas of what are the most important physicochemical properties the seven generated by this script seem to cover almost all requests. It uses the ChemAxon Evaluator from ChemAxon to generate the data and Aabel from Gigawiz to plot the results.

The eagle-eyed amoung you will have noticed that the shell script is actually used to calculate a number of other properties, including the most acidic and basic pka, but these are not plotted. They just give you a flavour of the number of different properties that can be calculated. By editing the part of the script refering to Aabel you can pick and choose what to display, but be careful to note some are continuous properties like polar surface area (psa), whilst others are catagorical such as count of hydrogen bond donors (HBD) and use different plots. I've used this script to profile a library of 500,000 compounds, the only issue is that you might have to edit the axis display to avoid overlap of text.

The results look like this:-
profile

--Requires sdf file for IDs

property obgrepPath : "'/usr/local/bin/obgrep'"

set user_path to (path to desktop) as text
--file for calculated results
set result_file to user_path & "results.txt" as text
set posix_result_file to quoted form of POSIX path of result_file

set this_file to choose file
set this_file_text to (this_file as text)
tell application "Finder" to set file_name to (name of this_file)

--display dialog file_name

--get the posix path to chosen file
set posix_this_file to quoted form of POSIX path of this_file
--use openbabel to get number of structures
set obgrep_command to obgrepPath & " -v -c \"NNNNNN\" '" & posix_this_file & "'"

try
  set obgrep_command_shell to obgrep_command & " |cut -d  \" \" -f2"
    set count_lines to (do shell script obgrep_command_shell) as string

    set the_text to " The file " & file_name & " contains " & count_lines & " structures."
    --You can use this as a title for graphs if needed
    --display dialog the_text
end try



--get molecular properties
set shell_script to "'/Applications/ChemAxon/MarvinBeans/bin/evaluate' " & posix_this_file & " -e \"ringAtomCount(); logp(); logd('7.4'); apka('1'); bpka('1'); atomCount(); mass(); acceptorcount(); donorcount(); psa(); rotatablebondcount(); atomCount()-atomCount('1')\"" & "   -o " & posix_result_file



do shell script shell_script
--Add names of properties

set theFileReference to open for access result_file with write permission
set theFileContents to read theFileReference

--These are the ones I used

set new_file_contents to "name; logP; logD; apka; bpka; atomCount; mass; HBA; HBD; PSA; RBC; HAC" & "
" & theFileContents
set eof of theFileReference to 0
write new_file_contents to theFileReference starting at eof
close access theFileReference


tell application "Aabel_3"
  Run
    set this_file to "Macintosh HD:Users:swain:Desktop:results.txt"
    set thetabdelimitedfile to (result_file as text) as alias


    ImportDataIntoNewWorksheet thetabdelimitedfile

    set currentdirectory to alias "Macintosh HD:Public"
    SetCurrentDirectory currentdirectory

    --Chart1 logP
    CreateNewViewer "1"
    activate

    SelectChart "8 1"

    SelectDefaultFillColor "120"

    SelectVariables "1 1"

    SetChartInstanceDimensions "0.2 3.2 0.2 3.2"

    --Chart2 Mass
    CreateNewChartInstance " "

    SelectDefaultFillColor "21"

    SelectChart "8 1"

    SelectVariables "1 6"

    SetChartInstanceDimensions "3.2 6.2 0.2.2 3.2"

    --Chart3 PSA
    CreateNewChartInstance " "

    SelectDefaultFillColor "45"

    SelectChart "8 1"

    SelectVariables "1 9"

    SetChartInstanceDimensions "6.2 9.2 0.2.2 3.2"

    --Chart4 (HBA)
    CreateNewChartInstance " "

    SelectDefaultFillColor "64"

    SelectChart "8 4 1 6 1 1 -1 0 0 0 0 1 0 0.2 1 1 1"

    SelectVariables "1 7"

    SetChartInstanceDimensions "0.2 3.2 3.2 5.0"
    --Chart5 HBD
    CreateNewChartInstance " "

    SelectDefaultFillColor "64"

    SelectChart "8 4 1 6 1 1 -1 0 0 0 0 1 0 0.2 1 1 1"

    SelectVariables "1 8"

    SetChartInstanceDimensions "0.2 3.2 5.2 7.0"

    --Chart6 HAC

    CreateNewChartInstance " "

    SelectDefaultFillColor "25"

    SelectChart "8 4 1 6 1 1 -1 0 0 0 0 1 0 0.2 1 1 1"

    SelectVariables "1 11"

    SetChartInstanceDimensions "3.2 6.2 3.2 6.2"

    --Chart7 RBC


    CreateNewChartInstance " "

    SelectDefaultFillColor "245"

    SelectChart "8 4 1 6 1 1 -1 0 0 0 0 1 0 0.2 1 1 1"

    SelectVariables "1 10"

    SetChartInstanceDimensions "6.2 9.2 3.2 6.2"

    SetDefaultTextLineFont "Helvetica"

    SetDefaultTextLineFontStyle "Bold"

    SetDefaultTextLineFontSize "14"

    SelectDefaultLineColor "1"

    --CreateTextLine the_text

    --If you want a pdf copy
    --ExportVisibleViewerContent "MyfileNew.pdf"



end tell



The shell script below calcualtes a variety of descriptors provided the structure input in sdf file format. Most will be self evident but a couple might require further explanation.

set shellscript to "'/Applications/ChemAxon/MarvinBeans/bin/evaluate' " & posixthisfile & " -e \"field('compoundid');molString('smiles'); logp(); logd('7.4'); apka('1'); bpka('1'); atomCount(); mass(); acceptorcount(); donorcount(); psa(); rotatablebondcount(); atomCount()-atomCount('1');aromaticAtomCount()/(atomCount()-atomCount('1'))\"" & " -o " & posixresultfile

field('compound_id') This extracts the contents of the field "compound_id" in the sdf file, you will of course need to edit this to the actual name of the field in the file you are dealing with.

atomCount()-atomCount('1') This is a count of all atoms minus hydrogen atoms to give the heavy atom count.

aromaticAtomCount()/(atomCount()-atomCount('1') This gives the fraction of heavy atoms that are aromatic.

The output file contains the descriptors as ";" separated text.

For more information on Applescript have a look at the Applescript Resources Page.