Macs in Chemistry

Insanely great science

 

Adding substructure searching to a FileMaker Pro Database

Anyone who has had to store or search a collection of chemical structures rapidly realises that they need a software tool with a little chemical intelligence. Whilst there are a number of commercial databases they tend to be rather expensive and often require a knowledge of SQL or dedicated IT support. Fine for large corporations but not suitable for a single chemist or small group. In contrast FileMaker Pro is a popular desktop database with an easy to use interface (there are also server and mobile versions). Unfortunately whilst it is easy to use it does not support chemical structure based searching. Fortunately FileMaker Pro comes with an easy to use scripting interface and we can create scripts that run command line applications like Openbabel.

Open Babel: The Open Source Chemistry Toolbox: Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.

The best way to install Openbabel is to use Home-brew.

To install a range of cheminformatics packages we can use a custom “tap” created by Matt

brew tap mcs07/cheminformatics

Then run

brew install open-babel

There are more details on how to install cheminformatics tools on a Mac here.

To check OpenBabel is working type this in a Terminal window:

obabel -:'C1=CC=CC=C1F' -ocan 
Fc1ccccc1   
1 molecule converted

We will use Openbabel to actually run a search on an external file pass the results to FileMaker which will display the selected results.

Creating the Database

First we need some structures, Ive provided a file containing around 13,000 reagents in SMILES format (you will need to unzip), the file contains the SMILES string and a numeric identifier. This file has UNIX line endings and it is

Ic1ccccc1   1
CC=O    2
CC(O)=O     3
CC(OC(C)=O)=O   4
CC(C)=O     5
CC#N    6
CC(c1ccccc1)=O  7
CC(Br)=O    8
CC(Cl)=O    9
O=C(n1c2c(cc1)cccc2)C   10
CC(c1nccnc1)=O  11

This file has UNIX line endings and it is important to ensure this is the case as shown below (Viewed using BBEdit) as several of the OpenBabel command line tools cause errors when handling Mac line endings.

Unix_fileformat

First open FileMaker and from the File menu select "New Database", select the "Create a new empty file" radio button and call it Reagents.fmp12 and save it. We now need to create two fields in the database, SMILESstring (as a text field) and Recid (as a number).

CreateFields

Now click OK, and from the FileMaker File menu select Import Records>File and navigate to the SMILES file and import as shown below

import

You can now modify the layout to show a couple of fields

layout

Whilst a SMILES string is a compact and efficient way to store chemical structures in a database most chemists would prefer to see 2D representations of the structures. To do this we first create a container field that will store the 2D image of the molecular structure. Whilst it would be possible to draw all the structures by hand, by using a mixture of FileMaker scripting and Applescript with ChemDraw it is possible to automate the structure generation.

We need two scripts, the first script loops through the database copying the SMILES_string of the current record and then passes the SMILES string to an Applescript that uses ChemDraw to generate the 2D image and copies it to the clipboard. The second script then pastes the image into the Structure container field.

The second script is shown below, it simply pastes the contents of the clipboard into the Structure field.

PasteCD

The first script is shown below, first we go to the first record then loop through all the records, first copying the SMILES string to the clipboard and then performing an Applescript.

ChemDraw_from_SMILES

The Applescript is shown below, we control the menu items to paste as SMILES to generate the 2D structure, we then select the image then do menu item "Cut", this copies the image to the clipboard and also clears the ChemDraw document. We then bring FileMaker Pro to the front.

tell application "ChemDraw"
activate
if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"

            if enabled of menu item "Select All" then
                do menu item "Select All"
            end if

        do menu item "Cut"

        end tell
        tell window "Reagents" of application "Filemaker Pro"
activate

end tell

The FileMaker script then calls the Paste_CD script to paste the 2D image into the Structure container field.

Stuctures

Simple Searching

You can do some simple searching, simply search for "Br" in the SMILES_string field to identify all structures containing bromine, or search for a lower case "c" to find all containing an aromatic carbon. However as the image above shows not all SMILES strings show the aromatic ring in the correct format, many have Kekule form, aliphatic carbons connected by alternating single and double bonds as shown below.

Benzene (aromatic representation) c1ccccc1 Benzene (Kekulé form) C1=CC=CC=C1

There are also a number of other functional groups that can be drawn in different ways, nitro, sulphoxide, sulphone etc. and so we need to standardise the format of the SMILES string.

SMILES canonicalisation using Openbabel

As we showed earlier Openbabel can be used to standardise the SMILES strings and canonicalise them.

obabel -:'C1=CC=CC=C1F' -ocan 
Fc1ccccc1   
1 molecule converted

A SMILES string is a way to represent a 2D molecular graph as a 1D string. In most cases there are many possible SMILES strings for the same structure. Canonicalization is a way to determine which of all possible SMILES will be used as the reference SMILES for a molecular graph.

The FileMaker Pro script to standardise the SMILES strings is shown below, we loop through the records calling OpenBabel using the Applescript shown below.

obscript

The Applescript gets the SMILES string of the current record, then constructs the shell script command, including options "xc" to canonicalise, we then use the do shell script command and then copy the returned canonicalised SMILES string into the appropriate cell in the current record.

set the_smiles to ""
set ob_smiles to ""

set the_smiles to (cell "SMILES_string" of current record)

if the_smiles is "" then 
--do nothing
else
set the_script to "echo '" & the_smiles & "' | /usr/local/bin/obabel  -ismi -osmi -xc"

end if

try
set ob_smiles to (do shell script the_script)

—display dialog ob_smiles

end try

if ob_smiles is "" then
--do nothing
else
set cell "OpenBabel_SMILES" of current record to ob_smiles
end if

As you can see below both records now have the same canonical SMILES and we can use this field for searching.

canonicalSMILES

In a similar manner we can also add a field containing the InChiKey by changing the OpenBabel shell script as shown below.

/usr/local/bin/obabel  -ismi -oinchikey

InChiKeys are used by a number of online databases to aid searching for example UniChem, we can use the the UniChem web api to create a link

In FileMaker create a new script called "SearchUniChem" as shown below.

SearchUniChem

We can link this to a button in FileMaker so that clicking searches UniChem for the current record InChiKey and displays the results in your web browser.

UnichemResults

Structure-based searching

The strategy for substructure searching is to use OpenBabel to search a file containing all the structures in the database and the corresponding record ID, OpenBabel returns a list of all the record ID that match the query. We then use the list of Record ID to query FileMaker and select the corresponding records.

To do this we first set up a related record search in FileMaker, first define another field as before called Find_List, this time click on the Options button and in the "Storage" select "Use global storage (one value for all records).

RelatedField

We can now set up the relationship, in the Define Database window click on the "Relationships" tab and then click on the edit relationships button (outlined in red below). From the two dropdown menus select Reagents and in the first window select "FindList" and in the second "Redid, click OK and you should be prompted to give the relationship a name call it "SMILESLink", now click OK.

Relationship

We now need to set up the FileMaker part of the search, click on "Scripts" in the FileMaker main menu and create a new script. Call the script "Find_Related" and from the list on the right select "Go to Related record", if you then double click on the line in the script box you can modify the result options to Show only related records, select the table "SMILESLink" and display using the current layout.

FindRelatedScript

Now add the FindList field to the layout and populate it with a list of numbers into the FindList box you can see how the related records search works. If you now select "FindRelated" from the scripts menu the result should be a "Found Set" of only those records that were in the Find_List field

Findlistsearch

The next step is to create the substructure search script. We now define another field as before called SMARTS_Query, this time lick on the Options button and in the "Storage" select "Use global storage (one value for all records). This will be the text string we used to do the substructure search.

SMARTS is a language that allows you to specify substructures using rules that are straightforward extensions of SMILES. For example, to search a database for phenol-containing structures, one would use the SMARTS string [OH]c1ccccc1.

We now need to set up the files OpenBabel will use to do the searching, firstly reagents.smi, whilst FileMaker needs a tab delimited file to import, the file is actually a SMILES file (unfortunately the same extension .smi is used for self-mounting images). You now need to decide where you are going to store all the files since we will need to have explicit paths to the files to do the searching.

The FileMaker script is shown below, calling Openbabel using an Applescript.

ob_search

The AppleScript is shown below, this script takes the contents of the cell "SMARTSquery" and uses it to construct the shell script "thescript", you will need to edit the shell script to include the full path to the reagents.smi file. The do shell script then calls OpenBabel to do the actual search and returns theresults (a list of record ids). This list is then put in "Findlist" and the related records search is run.

set the_smarts to (cell "SMARTS_query" of current record)

display dialog the_smarts

set the_script to "/usr/local/bin/obabel /Users/username/Path_to_file/reagents.smi -osmi -xt -s'" & the_smarts & "'"


set the_results to do shell script the_script --& " || echo ERROR" without altering line endings

set cell "Find_list" to the_results

You can test this with a simple query, c1ccccc1Br should return all molecules containing bromobenzene.

If you are unfamiliar with the SMARTS query language there is an excellent primer on the Daylight site.

Fast Searching using fingerprints.

Whilst to script above works fine for relatively small datasets (<10,000's) and it is possible to do substructure and similarity searches without the need to build a similarity index, however larger datasets (up to 100,000's) can be searched rapidly once a fastsearch index has been built. On larger datasets it is necessary to first build a fastsearch index. This is a new file that stores a database of fingerprints for the files indexed. You will still need to keep both the new .fs fastsearch index and the original files. However, the new index will allow significantly faster searching and similarity comparisons.

To create the fastsearch index you can either use iBabel a GUI for Openbabel or issue this command in the "Terminal". Remember to change the file paths to the reagents.smi file.

/usr/local/bin/obabel   '/Users/username/Path_to_file/reagents.smi' -ofs -xFP2   '/Users/username/Path_to_file/reagents.fs' 
This will prepare an index of /Users/username/Path_to_file/reagents.smi and may take some time...
It contains 13590 molecules Estimated completion time 4.85289 seconds

 It took 7.64828 seconds
13590 molecules converted

This command creates a fast search file using FP2 (the default fingerprint) which are fingerprints that Indexes linear fragments up to 7 atoms. This can be searched using a SMARTS string, for example to identify all records containing Bromobenzene type in the following.

/usr/local/bin/obabel /Users/username/Path_to_file/reagents.fs -osmi -xt -s'c1ccccc1Br'
576 candidates from fingerprint search phase
18
160
167
168
169
275
276
277
278
&hellip;

We can use this in a FileMaker script called substructuresearchfs.

fs_search

Where the AppleScript is

set the_smarts to (cell "SMARTS_query" of current record)

display dialog the_smarts

set the_script to "/usr/local/bin/babel /Users/username/Path_to_file/reagents.fs -osmi -xt -s'" & the_smarts & "'"


set the_results to do shell script the_script --& " || echo ERROR" without altering line endings

&mdash;display dialog the_results

set cell "Find_list" to the_results

The results should be the same as the OpenBabel substructure search, just obtained orders of magnitude faster.

We can also use the fast search index to conduct similarity searches. We first create a new global numeric field in the database called "Sim_num" and modify the AppleScript to include the desired similarity score.

set the_smarts to (cell "SMARTS_query" of current record)
set the_num to (cell "Sim_num" of current record)

set the_script to "/usr/local/bin/babel /Users/username/Path_to_file/reagents.fs -osmi -xt -s'" & the_smarts & "'  -at" & the_num


&mdash;display dialog the_script

set the_results to do shell script the_script --& " || echo ERROR" without altering line endings
&mdash;Display dialog  the_results

set cell "Find_list" to the_results

Adding a structure Editor

As it stand you can use you favourite structural editor as long as it provides a means for generating a SMILES/SMARTS query string. It would be nice however to embed the structure editor within the FileMaker application, this was a very difficult until the recent development of javascript based editors. By adding a web view we can display the editor within FileMaker and then use Javascript to communicate between the editor and FileMaker.

In this example we are going to use the excellent Ketcher

Since Ketcher is written in pure Javascript, it incorporates high performance, good portability and light weight. You will not need any Java or Flash plugins to use it in your browser. Ketcher is completely free and open-source, while also available on a commercial basis. You can easily integrate Ketcher into your web application. The process of integration is very simple and well-documented.

This needs to be downloaded and installed in the same folder as the database.

First we need to create a new global container "KetcherWebView"

KetcherWebView

Then create a new layout and add the container field, double-click on the field in layout mode and set it up as a Web View pointing to a custom web address, the file "demo.html" will be the web page containing Ketcher.

webviewsetup

The demo.html page needs to contain this html, in this example we are only using the getSMILES but it could be extended to other methods. This assumes that Ketcher is in the same folder.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <script type="text/javascript" src="ketcher-master/prototype-min.js"></script>
    <script type="text/javascript">
        function getKetcher()
        {
            var frame = null;

            if ('frames' in window && 'ketcherFrame' in window.frames)
                frame = window.frames['ketcherFrame'];
            else
                    return null;

            if ('window' in frame)
                return frame.window.ketcher;
        }

            function getSmiles()
        {
            var ketcher = getKetcher();

            if (ketcher)
                $('mySMILES').value = encodeURIComponent(ketcher.getSmiles());

        }

        function getMolfile()
        {
            var ketcher = getKetcher();

            if (ketcher)
                $('textarea').value = ketcher.getMolfile();
        }

        var row = 1;

        function render()
        {
            var molfile = $('textarea').value;

            var smiles = molfile.strip();

            if (smiles == '' || smiles.indexOf('\n') == -1)
            {
                alert("Please, input Molfile");
                return;
            }

            var renderOpts = {
                'autoScale':true,
                'debug':true,
                'autoScaleMargin':20,
                'ignoreMouseEvents':true
            };


        }
        function submitForm() 

        {
                theParam = document.getElementById('mySMILES').value;
                theURL = "fmp://$/Reagents?script=WebQuery&param=" + theParam;
                window.location = theURL ;
                    document.getElementById('mySMILES').value = "";
                return false;
        }

    </script>


</head>
<body>


<div style="width:80%;padding:0px;margin:auto;">
    <div>
        <iframe id="ketcherFrame" name="ketcherFrame" src="ketcher-master/ketcher.html" width="840" height="500"></iframe>

            <input type="button" style="margin:10px" value="Get SMILES" onclick="getSmiles()"></input><br/>
            SMILES query:
            <form onsubmit="return submitForm()">
            <input id="mySMILES" type="text" />
            <input type="submit" value="Submit" />
            </form>
        </div>
    </div>
</div>

If all is set up correctly you should now see this in the FileMaker layout. Draw a structure and press the Get Smiles button, this should populate the SMILES query box. You will notice that we have to encode the SMILES query to make sure it is URL safe before we can use javascript to transfer the query to FileMaker. You can then run the substructure or similarity search as before.

ketcher

Common Issues

The path to obabel is incorrect, depending on the method by which Openbabel was installed it can be installed in different locations. To check where it is on your machine in the Terminal type to get the path to obabel

which obabel
/usr/local/bin/obabel

The SMILES file containing structures needs to have UNIX line endings, if your input file does not have UNIX line endings you can either use a text editor like BBEdit to change it to UNIX format of run this command in the Terminal

cat input_file.txt | tr '\r' '\n' | tr -s '\n' > unix_file.txt

The file paths all need to be in the following format

/Users/username/Path_to_file/reagents.fs

If you are not sure of the file path simply drag and drop the file onto a Terminal window to get the path.

SMILES queries, if you draw an aromatic ring in the Kekule form the search (C1CCCCC1) will only flag those structures that are in that form and not the aromatic (c1ccccc1).

Download

You download the files described in this tutorial here https://www.macinchem.org/reviews/FMPro/Myreagents.zip, this includes the database, all the scripts, Ketcher and the html page. For this to work you need to have the folder on your desktop, if you want to move it elsewhere you will need to edit the file paths in the scripts.

Last Updated 18 June 2019