iBabel 4.0 a Cocoa/Swift rewrite
Why the rewrite ?
iBabel is a graphical user interface (GUI) to the open-source cheminformatics toolkit Open Babel described in an article in J Cheminformatics, Open Babel: An open chemical toolbox DOI. iBabel was originally written as an AppleScript Studio application which underwent several updates.
In addition Open Babel has been substantially rewritten version 3.0 this brought a lot of changes breaking the API in many places and removing the "babel" executable replacing it with "obabel". Some of the small programs/tools built on OpenBabel are no longer supported. This functionality has not been lost however, some have now been incorporated into the main OpenBabel program.
So rather than try to patch the AppleScript Studio/ApplescriptObjC application I decided on a starting afresh, this has meant some features have been lost, some new ones added but hopefully the core features are still there
Before using iBabel you need to first install Open Babel
So OpenBabel can be installed using Homebrew
brew install open-babel
conda install -c openbabel openbabel
You could also compile it yourself and there are detailed instructions here Compiling Open Babel.
Installing and starting
You can download iBabel.zip here. It has only been tested with Mac OSX 10.14 and higher. It should uncompress automatically and you can install it into the Applications folder. The first time you open the application you may get a warning about iBabel not being notatarized.
Depending on which route you choose Open Babel could be installed in a variety of locations, iBabel needs to know which location to look and so the first time you start the iBabel application you will be presented with the dialog below.
In a Terminal window type "which obabel", this will return the path to the obabel executable as shown below, copy the path to the clipboard
which obabel /Users/chrisswain/miniconda3/bin/obabel
Hit OK and then open the iBabel preferences, available from the iBabel top menu bar or by the keyboard shortcut ⌘,
Then paste the path into the text box and hit OK, iBabel will now use this version of obabel, if you have multiple versions of obabel installed you can change this later. The main iBabel interface is divided into five tabs "Convert", "Search", "Tools", "Viewer" and "Seeker". There is extensive tool tips support, hovering over an interface item (button, field, menu) gives a brief description.
The Convert Tab
The Convert tab gives access to the core OpenBabel file conversion tools. OpenBabel has support for 145 formats in total. It can read 107 formats and can write 107 formats, some of the formats also have additional read and write options. Clicking on the "Input File" button opens a model sheet allowing the user to navigate to and select the input file, whilst obabel will attempt to identify the file type based on the extension this can be set explicitly by selecting from the dropdown menu. For some input file types there are also additional options that can be selected from the dropdown menu.
The "Output File" allows the user to select the output file and offers similar choices for filetype and options as shown below.
The "Count" button determines the number of molecules in a file a posts the result to the obabel command text box.
The "List tags" button searches through sdf files and reports the list of identified fields in the obabel command text box as shown below.
The rest of the checkboxes offer additional obabel options.
- Center atomic coordinates at (0,0,0)
- Convert dative bonds (e.g. [N+]([O-])=O to N(=O)=O)
- Do not convert duplicate molecules
- Generate 2D coordinates
- Add or replace molecular title with text from the box for every molecule
- Join all input molecules into a single output molecule entry
- Split multi molecular input into individual file
- Canonicalize the atom order.
- Generate 3D coordinates using default settings
- Compress the output with gzip
- Remove all but the largest contiguous fragment (strip salts)
- Append to title, list can be a value from a property in the file eg "Name" or a calculated property like MW or both.
The filter option allows selection of a subset of molecules based either on properties imported with the molecule (as from a SDF file) or from calculations made by Open Babel on the molecule.
You can filter on two types of property:
• An SDF property, as the identifier ROTATABLE_BOND could be. There is no need for it to be previously known to Open Babel, you can list the available fields in the sdf using the "List fields" button as described above.
• A descriptor name (internally, an ID of an OBDescriptor object). This is a plug-in class so that new objects can easily be added. MW is the ID of a descriptor which calculates molecular weight. You can see a list of available descriptors using:
Clicking on the question mark button gives more details. Open Babel uses a SDF-like property (internally this is stored in the class OBPairData) in preference to a descriptor if one exists in the molecule (e.g. LogP). There are more details in the OpenBabel documentation.
The "hydrogens" dropdown menu gives a couple of options,
- Add hydrogens (make all hydrogen explicit)
- Delete hydrogens (make all hydrogen implicit)
- Add hydrogens appropriate for pH
By default all molecules are converted, alternatively a range of molecules can be chose using the from and to options.
Or you can filter the molecules that are converted based on a SMARTS pattern, with the option that the molecules either have to match or not match
Clicking the "Check" button generates the obabel command and posts it into the obabel command text box, this is really useful for error checking but also allows more experienced users to customise the command, adding wild cards etc.. The resulting command can then be run by clicking the "Run" button.
The "Convert" button runs the file conversion.
The Search Tab
The Search tab provides access to substructure and similarity searching. Clicking on the Editor button (bottom right) brings up the inbuilt JSME editor kindly provided by Peter Ertl and Bruno Bienfait to generate the queries.
OpenBabel offers two strategies for substructure searching, using obabel to directly search the input file using a query SMARTS, this works fine for files containing up to a few thousand molecules. For larger datasets the better approach is to create a "fast search index" and then to use that for the searches.
For smaller datasets, choose the input and output files, if needed also add the filetype and options. Use the editor to draw the query structure then click on the yellow smiley face in the top left of the editor window, copy and paste the SMILES text string into the "query SMARTS" text field, choose whether you want matching or non-matching then click the "Search" button.
What are SMARTS?
SMILES arbitrary target specification (SMARTS) is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing. SMARTS is related to the SMILES line notation that is used to encode molecular structures and like SMILES was originally developed by David Weininger and colleagues at Daylight Chemical Information Systems, however it also includes wild cards for atoms and bonds, descriptors for connectivity and ring systems. It also allows for logical operations.
It is beyond the scope of this article to describe SMARTS in details but here are a few useful resources.
For larger datasets, and this can be unto millions of structures, we first need to generate the fast search index. This is a new file that stores a database of fingerprints for the files indexed. You will still need to keep both the new .fs fastsearch index and the original files. However, the new index will allow significantly faster searching and similarity comparisons.
To create the fast search index, first select an input file as before, then choose where you want the "Fastsearch file" to be saved, then click the "Create" button. For vary large files this may take some time.
To run the substructure search, complete the "query SMARTS" text field, ensure the path to the Fastsearch file is correct and then click the "Search" button.
By default a substructure search is run, however if the "sim" text box is filled a similarity search is conducted. You need to enter a value of similarity between 0 and 1, with higher numbers being most similar.
The tools tab gives access to a variety of different tools that have been built using the OpenBabel toolkit, these include tools for conformer generation, 3D minimisation, molecular alignment and tautomerisation. Whilst originally these were all standalone tools a number have now been incorporated into the obabel executable. In those cases the obtool now uses the version in the obabel executable rather than the standalone tool. A list of the available tools is available from the dropdown menu.
Each of the tools has a separate set of options so selecting a tool results in the appropriate options being displayed, a few examples are shown below. Unfortunately there are no man pages for the tools but hopefully the descriptions are sufficient.
Spectrophores are novel descriptors that are calculated from the three-dimensional atomic properties of molecules DOI.
This technology allows the accurate description of molecules in terms of their surface properties or fields. Comparison of molecules’ property fields provides a robust structure-independent method of aligning actives from different chemical classes. When applied to molecules such as ligands and drugs, Spectrophores can be used as powerful molecular descriptors in the fields of cheminformatics, virtual screening, and QSAR modeling.
The technology is described in the OpenBabel documentation. A Spectrophore is calculated by surrounding the three-dimensional conformation of the molecule by a three-dimensional arrangement of points, followed by calculating the interaction between each of the atom properties and the surrounding the points. From the obspectrophore tool panel, first select the input file (this needs to be 3D structures) then click to generate the spectrophore file, for files containing a large number of structures this may take some time. Once the spectrophore has been generated you can then compare a new structure with the spectrophore file.
First select the query molecule, then click Run. The file path to the search results will be shown in the box, the format is shown below, where smaller scores are more similar.
The Viewer tab provides a means to search, filter, select, view and export multi molecular files. To import a file select it using the "Input file" button then click on the "Import" button, the table will be populated with the number of the structure in the file, the name taken from the tele field, and the file path (to keep track which file contains a particular structure). I don't know how many structures you can import, in the example below I used a file of 50,000 structures from ZINC and this took a couple of minutes. Clicking on a record displays the structure in the righthand panel.
To remove all records click on the "Del All" button, and as before clicking on the "Editor" button brings up the JSME editor.
You can filter the selection using substructure searching, using the SMARTS query text box and clicking the "Filter" button. The table will then only display the records that match the query, to restore all records to the display click the "Remove" button. The portion of the molecule that matches the SMARTS will be highlighted in red.
To search the molName field simply type into the search box with the magnifying glass icon. You can select multiple records as you would in any Mac app, shift click selects a range of records, ⌘ click (command click) to select multiple individual records. Once you have made the selection you can export them to the Output file using the "Export" button. I've tested this using a 50,000 structure file and the process take a couple of minutes, on smaller files (< 5,000) the process is almost instantaneous. By clicking on the column header you can sort the column.
Name to structure
Whilst most of the time you will want to import a file containing molecular structures I suspect on occasions you have received a file simply containing names or identifiers like this
Set the text file as the input file, then click on the "PubChem" button, the app will use the text strings from the input list to search PubChem and then create a new file containing the SMILES strings for the records found. You can then click on the "Import" button to import them into the table view. The PubChem search engine is pretty comprehensive and can accommodate a variety of identifiers, including systematic names, trade names, ChEMBL identifiers and drug names. Depending on location and time of day you should allow 5 seconds per record for searching.
The results can be viewed, filtered and searched as before, and ⌘ click to select multiple individual records.
The tag field in the Table view is intended for users who want to create their own categories for sorting.
The Seeker tab gives access to a number of external web services, these include PubChem, Chemical Identifier resolver and the PDB. For small molecules enter the identifier, choose a search engine and hit search. If found a 2D image will be displayed, if you now click "Get SMILES" the SMILES string and InChiKey fields will be populated.
PubChem is an open chemistry database at the National Institutes of Health (NIH). “Open” means that you can put your scientific data in PubChem and that others may use it. Since the launch in 2004, PubChem has become a key chemical information resource for scientists, students, and the general public.
As an alternative to PubChem you can also use the Chemical Identifier resolver, and a possible enhancement is to include other search options.
This service works as a resolver for different chemical structure identifiers and allows one to convert a given structure identifier into another representation or structure identifier.
With the SMILES string we can now redraw in 2D using the OpenBabel toolkit layout algorithm, or generate a 3D structure and display it using 3Dmol.js. In addition we can use the InChiKey to search UniChem. UniChem efficiently produces cross-references between chemical structure identifiers from 39 different databases, it can very rapidly search using the InChikey as the query term, the output is displayed as a web page with links to the appropriate data sources. Clicking on the hyperlink then opens the record in the source database.
The Seeker tab also allows you to search the Protein database http://www.rcsb.org.
This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.
To search the PDB, first select the PDB as the search engine, type in the PDB code and click "search", an image of the structure should be displayed in the right-hand panel. If you want to view the protein in 3D, click the 3D view button and it will be displayed in you web browser.