Macs in Chemistry

Insanely great science

Applescript Tutorial 5

Building a substructure searchable database

Anyone who has had to store or search a collection of chemical structures rapidly realises that they need a software tool with a little chemical intelligence. Whilst there are a number of commercial databases they tend to be rather expensive. Fine for large corporations but not suitable for a single chemist or small group. Today I'm going to show you how to use OpenBabel to build a substructure searchable database. Whilst in theory you could use MySQL I'm going to assume that you don't actually want to become a database administrator and instead I'm going to use FileMaker, not free (there is a free trial) but incredibly easy to use. We will use Openbabel to actually run a search on an external file pass the results to FileMaker which will display the selected results. The easiest way to get OpenBabel if you have not done so already is to install ChemSpotlight.

First we need some structures I've provided a file (you will need to unzip) containing about 650 substituted acetophenones, the file contains a SMILES string and an identification number.

First open FileMaker and from the File menu select "New Database", select the "Create a new empty file" radio button and call it SMILESDatabase.fp7 and save it. We now need to create two field in the database, SMILES and IDNUM.

smiles_def

Now click OK, and from the FileMaker File menu select Import Records>File and naviagate to the SMILES file and import as shown below

smiles_import

We now need to set up a related record search within FileMaker, first define another field as before called Find_List, this time lick on the Options button and in the "Storage" select "Use global storage (one value for all records).

smiles_options

We can now set up the relationship, in the Define Database window click on the "Relationships" tab and then click on the edit relationships button (outlined in red below). From the two dropdown menus select SMILESdatabase and in the first window select "FindList" and in the second "IDNUM", click OK and you should be prompted to give the relationship a name call it "SMILESLink", now click OK. smiles_relationship

We now need to set up the FileMaker part of the search, click on "Scripts" in the FileMaker main menu and in the box that appears select "New" Call the script "FindRelated" and from the list on the left select "Go to Related record", if you then double click on the line in the script box you can modify it to Show only related records, select the table "SMILESLink" and display using the current layout.

smiles_script

If you now cut and paste a selection of the IDNUM into the FindList box you can see how the related records search works. If you now select "Find Related" from the scripts menu the result should be a "Found Set" of only those records that were in the Find_List field.

select_list

We now define another field as before called SMILESQuery, this time lick on the Options button and in the "Storage" select "Use global storage (one value for all records). This will be the text string we used to do the substructure search. We now need to set up the files OpenBabel will use to do the searching, firstly rename the downloaded file acetophenones.tab to acetophenones.smi, whilst FileMaker needs a tab delimited file to import, the file is actually a SMILES file (unfortunately the same extension .smi is used for self-mounting images). You now need to decide where you are going to store all the files since we will need to have explicit paths to the files to do the searching. For now lets assume you have a folder on your desktop called ChemDatabase and into this you have put both acetophenones.smi and SMILES_Database.fp7. OpenBabel can search SMILES files directly but it is MUCH, MUCH faster if you first create a fast search index. To do this you can either use iBabel a GUI for Openbabel or issue this command in the "Terminal".

/usr/local/bin/babel   /Users/your_user_name/Desktop/Chem_Database/acetophenones.smi -ofs -xFP2 /Users/your_user_name/Desktop/Chem_Database/acetophenones.fs

his creates a fast search file using FP2 which are fingerprints that Indexes linear fragments up to 7 atoms. This can be searched using a SMILES string, for example to identify all records containing iodobenzene type in the following.

usr/local/bin/babel /Users/swain/Desktop/Chem_Database/acetophenones.fs -osmi  -xt  -s'Ic1ccccc1'
3 candidates from fingerprint search phase
ID_NUM_00045320<br>
ID_NUM_00060283<br>
ID_NUM_00094998<br>
3 molecules converted<br>

We can use an applescript within FileMaker to run this sort of query and then put the results into the FindList field and then run the "FindRelated" script we wrote earlier. So back to the FileMaker Database, create a new script called "Substructure_search. scroll down to the bottom of the left hand list and select "Perform ApplexScript", double click on the command and enter the following applescript text, make sure you get all the paths correct for your machine

set the_smarts to (cell "SMILES_query" of current record)

set the_script to "/usr/local/bin/babel /Users/YOUR_USER_NAME/Desktop/Chem_Database/acetophenones.fs -osmi -xt -s'" & the_smarts & "'"<br>

set the_results to do shell script the_script --& " || echo ERROR" without altering line endings<br>
--display dialog the_results<br>


set cell "Find_list" to the_results</code><br>

This script takes the contents of the cell "SMILESquery" and uses it to construct the shell script "thescript". The do shell script then call Openbabel to do the actual search and returns theresults (a list of record ids). This list is then put in "FindList" and the related records search is run.

search_script

In the next Tutorial I'll show how we can add further data and also how we can render the structures.