Adding substructure searching to a FileMaker Pro Database
Anyone who has had to store or search a collection of chemical structures rapidly realises that they need a software tool with a little chemical intelligence. Whilst there are a number of commercial databases they tend to be rather expensive and often require a knowledge of SQL or dedicated IT support. Fine for large corporations but not suitable for a single chemist or small group. In contrast FileMaker Pro is a popular desktop database with an easy to use interface (there are also server and mobile versions). Unfortunately whilst it is easy to use it does not support chemical structure based searching. Fortunately FileMaker Pro comes with an easy to use scripting interface and we can create scripts that run command line applications like Openbabel.
Open Babel: The Open Source Chemistry Toolbox: Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
The best way to install Openbabel is to use Home-brew.
To install a range of cheminformatics packages we can use a custom “tap” created by Matt
brew tap mcs07/cheminformatics
brew install open-babel
There are more details on how to install cheminformatics tools on a Mac here.
To check OpenBabel is working type this in a Terminal window:
obabel -:'C1=CC=CC=C1F' -ocan Fc1ccccc1 1 molecule converted
We will use Openbabel to actually run a search on an external file pass the results to FileMaker which will display the selected results.
Creating the Database
First we need some structures, Ive provided a file containing around 13,000 reagents in SMILES format (you will need to unzip), the file contains the SMILES string and a numeric identifier. This file has UNIX line endings and it is
Ic1ccccc1 1 CC=O 2 CC(O)=O 3 CC(OC(C)=O)=O 4 CC(C)=O 5 CC#N 6 CC(c1ccccc1)=O 7 CC(Br)=O 8 CC(Cl)=O 9 O=C(n1c2c(cc1)cccc2)C 10 CC(c1nccnc1)=O 11
This file has UNIX line endings and it is important to ensure this is the case as shown below (Viewed using BBEdit) as several of the OpenBabel command line tools cause errors when handling Mac line endings.
First open FileMaker and from the File menu select "New Database", select the "Create a new empty file" radio button and call it Reagents.fmp12 and save it. We now need to create two fields in the database, SMILESstring (as a text field) and Recid (as a number).
Now click OK, and from the FileMaker File menu select Import Records>File and navigate to the SMILES file and import as shown below
You can now modify the layout to show a couple of fields
Whilst a SMILES string is a compact and efficient way to store chemical structures in a database most chemists would prefer to see 2D representations of the structures. To do this we first create a container field that will store the 2D image of the molecular structure. Whilst it would be possible to draw all the structures by hand, by using a mixture of FileMaker scripting and Applescript with ChemDraw it is possible to automate the structure generation.
We need two scripts, the first script loops through the database copying the SMILES_string of the current record and then passes the SMILES string to an Applescript that uses ChemDraw to generate the 2D image and copies it to the clipboard. The second script then pastes the image into the Structure container field.
The second script is shown below, it simply pastes the contents of the clipboard into the Structure field.
The first script is shown below, first we go to the first record then loop through all the records, first copying the SMILES string to the clipboard and then performing an Applescript.
The Applescript is shown below, we control the menu items to paste as SMILES to generate the 2D structure, we then select the image then do menu item "Cut", this copies the image to the clipboard and also clears the ChemDraw document. We then bring FileMaker Pro to the front.
tell application "ChemDraw" activate if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit" if enabled of menu item "Select All" then do menu item "Select All" end if do menu item "Cut" end tell tell window "Reagents" of application "Filemaker Pro" activate end tell
The FileMaker script then calls the Paste_CD script to paste the 2D image into the Structure container field.
You can do some simple searching, simply search for "Br" in the SMILES_string field to identify all structures containing bromine, or search for a lower case "c" to find all containing an aromatic carbon. However as the image above shows not all SMILES strings show the aromatic ring in the correct format, many have Kekule form, aliphatic carbons connected by alternating single and double bonds as shown below.
Benzene (aromatic representation) c1ccccc1 Benzene (Kekulé form) C1=CC=CC=C1
There are also a number of other functional groups that can be drawn in different ways, nitro, sulphoxide, sulphone etc. and so we need to standardise the format of the SMILES string.
SMILES canonicalisation using Openbabel
As we showed earlier Openbabel can be used to standardise the SMILES strings and canonicalise them.
obabel -:'C1=CC=CC=C1F' -ocan Fc1ccccc1 1 molecule converted
A SMILES string is a way to represent a 2D molecular graph as a 1D string. In most cases there are many possible SMILES strings for the same structure. Canonicalization is a way to determine which of all possible SMILES will be used as the reference SMILES for a molecular graph.
The FileMaker Pro script to standardise the SMILES strings is shown below, we loop through the records calling OpenBabel using the Applescript shown below.
The Applescript gets the SMILES string of the current record, then constructs the shell script command, including options "xc" to canonicalise, we then use the do shell script command and then copy the returned canonicalised SMILES string into the appropriate cell in the current record.
set the_smiles to "" set ob_smiles to "" set the_smiles to (cell "SMILES_string" of current record) if the_smiles is "" then --do nothing else set the_script to "echo '" & the_smiles & "' | /usr/local/bin/obabel -ismi -osmi -xc" end if try set ob_smiles to (do shell script the_script) —display dialog ob_smiles end try if ob_smiles is "" then --do nothing else set cell "OpenBabel_SMILES" of current record to ob_smiles end if
As you can see below both records now have the same canonical SMILES and we can use this field for searching.
In a similar manner we can also add a field containing the InChiKey by changing the OpenBabel shell script as shown below.
/usr/local/bin/obabel -ismi -oinchikey
InChiKeys are used by a number of online databases to aid searching for example UniChem, we can use the the UniChem web api to create a link
In FileMaker create a new script called "SearchUniChem" as shown below.
We can link this to a button in FileMaker so that clicking searches UniChem for the current record InChiKey and displays the results in your web browser.
The strategy for substructure searching is to use OpenBabel to search a file containing all the structures in the database and the corresponding record ID, OpenBabel returns a list of all the record ID that match the query. We then use the list of Record ID to query FileMaker and select the corresponding records.
To do this we first set up a related record search in FileMaker, first define another field as before called Find_List, this time click on the Options button and in the "Storage" select "Use global storage (one value for all records).
We can now set up the relationship, in the Define Database window click on the "Relationships" tab and then click on the edit relationships button (outlined in red below). From the two dropdown menus select Reagents and in the first window select "FindList" and in the second "Redid, click OK and you should be prompted to give the relationship a name call it "SMILESLink", now click OK.
We now need to set up the FileMaker part of the search, click on "Scripts" in the FileMaker main menu and create a new script. Call the script "Find_Related" and from the list on the right select "Go to Related record", if you then double click on the line in the script box you can modify the result options to Show only related records, select the table "SMILESLink" and display using the current layout.
Now add the FindList field to the layout and populate it with a list of numbers into the FindList box you can see how the related records search works. If you now select "FindRelated" from the scripts menu the result should be a "Found Set" of only those records that were in the Find_List field
The next step is to create the substructure search script. We now define another field as before called SMARTS_Query, this time lick on the Options button and in the "Storage" select "Use global storage (one value for all records). This will be the text string we used to do the substructure search.
SMARTS is a language that allows you to specify substructures using rules that are straightforward extensions of SMILES. For example, to search a database for phenol-containing structures, one would use the SMARTS string [OH]c1ccccc1.
We now need to set up the files OpenBabel will use to do the searching, firstly reagents.smi, whilst FileMaker needs a tab delimited file to import, the file is actually a SMILES file (unfortunately the same extension .smi is used for self-mounting images). You now need to decide where you are going to store all the files since we will need to have explicit paths to the files to do the searching.
The FileMaker script is shown below, calling Openbabel using an Applescript.
The AppleScript is shown below, this script takes the contents of the cell "SMARTSquery" and uses it to construct the shell script "thescript", you will need to edit the shell script to include the full path to the reagents.smi file. The do shell script then calls OpenBabel to do the actual search and returns theresults (a list of record ids). This list is then put in "Findlist" and the related records search is run.
set the_smarts to (cell "SMARTS_query" of current record) display dialog the_smarts set the_script to "/usr/local/bin/obabel /Users/username/Path_to_file/reagents.smi -osmi -xt -s'" & the_smarts & "'" set the_results to do shell script the_script --& " || echo ERROR" without altering line endings set cell "Find_list" to the_results
You can test this with a simple query, c1ccccc1Br should return all molecules containing bromobenzene.
If you are unfamiliar with the SMARTS query language there is an excellent primer on the Daylight site.
Fast Searching using fingerprints.
Whilst to script above works fine for relatively small datasets (<10,000's) and it is possible to do substructure and similarity searches without the need to build a similarity index, however larger datasets (up to 100,000's) can be searched rapidly once a fastsearch index has been built. On larger datasets it is necessary to first build a fastsearch index. This is a new file that stores a database of fingerprints for the files indexed. You will still need to keep both the new .fs fastsearch index and the original files. However, the new index will allow significantly faster searching and similarity comparisons.
To create the fastsearch index you can either use iBabel a GUI for Openbabel or issue this command in the "Terminal". Remember to change the file paths to the reagents.smi file.
/usr/local/bin/obabel '/Users/username/Path_to_file/reagents.smi' -ofs -xFP2 '/Users/username/Path_to_file/reagents.fs' This will prepare an index of /Users/username/Path_to_file/reagents.smi and may take some time... It contains 13590 molecules Estimated completion time 4.85289 seconds It took 7.64828 seconds 13590 molecules converted
This command creates a fast search file using FP2 (the default fingerprint) which are fingerprints that Indexes linear fragments up to 7 atoms. This can be searched using a SMARTS string, for example to identify all records containing Bromobenzene type in the following.
/usr/local/bin/obabel /Users/username/Path_to_file/reagents.fs -osmi -xt -s'c1ccccc1Br' 576 candidates from fingerprint search phase 18 160 167 168 169 275 276 277 278 …
We can use this in a FileMaker script called substructuresearchfs.
Where the AppleScript is
set the_smarts to (cell "SMARTS_query" of current record) display dialog the_smarts set the_script to "/usr/local/bin/babel /Users/username/Path_to_file/reagents.fs -osmi -xt -s'" & the_smarts & "'" set the_results to do shell script the_script --& " || echo ERROR" without altering line endings —display dialog the_results set cell "Find_list" to the_results
The results should be the same as the OpenBabel substructure search, just obtained orders of magnitude faster.
We can also use the fast search index to conduct similarity searches. We first create a new global numeric field in the database called "Sim_num" and modify the AppleScript to include the desired similarity score.
set the_smarts to (cell "SMARTS_query" of current record) set the_num to (cell "Sim_num" of current record) set the_script to "/usr/local/bin/babel /Users/username/Path_to_file/reagents.fs -osmi -xt -s'" & the_smarts & "' -at" & the_num —display dialog the_script set the_results to do shell script the_script --& " || echo ERROR" without altering line endings —Display dialog the_results set cell "Find_list" to the_results
Adding a structure Editor
In this example we are going to use the excellent Ketcher
This needs to be downloaded and installed in the same folder as the database.
First we need to create a new global container "KetcherWebView"
Then create a new layout and add the container field, double-click on the field in layout mode and set it up as a Web View pointing to a custom web address, the file "demo.html" will be the web page containing Ketcher.
The demo.html page needs to contain this html, in this example we are only using the getSMILES but it could be extended to other methods. This assumes that Ketcher is in the same folder.