Macs in Chemistry

Insanely great science

 

Importing Open Source Antibiotics Data into DataWarrior

Although there is a pressing need for research into and development new antibiotics, policy-makers do not want healthcare professionals to use them. Meaning products should sit on the shelf until they are really needed by patients due antimicrobial resistance caused by inappropriate use of the existing classes of antibiotics over many years. The result is many Pharma companies have dropped investment antibiotic research because they feel they will be unable to recoup their investment.

The public health implications of a drying pipeline of new antibiotics has led to an increased focus on alternative models to incentivise antimicrobial research. The Open Source Antibiotics Consortium is a group of researchers working together to discover new antibiotics in an open source manner. All data is freely available and in the open. There is also an effort to develop open source solutions to the computational needs of the project. The CompChem Tools page provides links to a variety of Open-Source tools, scripts, Jupyter notebooks.

Currently there is no chemical database for the project and all structures are held in a freely accessible online spreadsheet this contains name, structure (as SMILES), InChiKey etc together with the biological data. Whilst this provides ready access it is not really in a useful format. There is a Jupyter notebook that links to the spreadsheet, but that is only really popular with CompChem. There is clearly a need to provide ready access to medicinal chemists and biologists.

DataWarrior is chemically intelligent spreadsheet that can be used to visualise molecules.

DataWarrior2

Whilst it is possible for users to download the spreadsheet and then import it would be better if it could be done automatically. Fortunately DataWarrior has a macro function that allows the user to automate actions. The macros are written in a plain text XML-like language.

<macro name="Calculate Properties">
<task name="calculateCompoundProperties">
propertyList=fragmentWeight logP    logS    acceptors   donors  druglikeness    mutagenic   tumorigenic reproEffective  irritant
structureColumn=Structure
</task>
<task name="calculateNewColumn">
columnName=DrugScore
formula=(0.5+0.5/(1+exp(cLogP-5)))*(1-0.5/(1+exp(cLogS+5)))*(0.5+0.5/(1+exp(0.012*Molweight-6)))*(1-0.5(1+exp(Druglikeness)))*if(Mutagenic=="high",0.6,if(Mutagenic=="low",0.8,1))*if(Tumorigenic=="high",0.6,if(Tumorigenic=="low",0.8,1))*if(ReproductiveEffective=="high",0.6,if(ReproductiveEffective=="low",0.8,1))*if(Irritant=="high",0.6,if(Irritant=="low",0.8,1))
</task>
</macro>

With the help of Isabelle Giraud we were able to create a macro that imported the data directly in DataWarrior, using the same URL used for the Jupyter notebook.

<macro name="OSA All Molecules">
<task name="retrieveDataFromURL">
url=https://docs.google.com/spreadsheets/d/168-a1_l51Nfbms67eG8zU8p-EhEtEO26FUzRInbu7fY/export?format=tsv&gid=2078630269format=td
</task>
</macro>

You can paste this into a text editor and save the file somewhere, in DataWarrior from the Macro menu select import. The macro should then appear in the "Run Macro" menu items. Run the macro and you should see something like this.

Screenshot 2021-03-29 at 09.53.03

The online spreadsheet also contains several tabs for the different projects, these can each be accessed by modifying the URL as shown below. Note the link to each tab contains a hash symbol "#" which needs to be removed.

<macro name="OSA Series2 data">
<task name="retrieveDataFromURL">
url=https://docs.google.com/spreadsheets/d/168-a1_l51Nfbms67eG8zU8p-EhEtEO26FUzRInbu7fY/export?format=tsv&gid=1078638615format=td
</task>
</macro>

This macro will then pull the data from a specific tab on the spreadsheet.

All the macros can be downloaded here. http://macinchem.org/reviews/OSA/MyMacros.zip.

Loading Macros on Startup

Whilst this works fine as it stands the user has to import the macros each time they restart DataWarrior, it would be nice if they were loaded automatically on startup. I noticed there are a couple of macros already installed and after a little hunting I found them in the application package. If you right-click on the DataWarrior icon you should see the following dropdown menu, select "Show Package Contents".

This should then show the following folder structure, the folder "macro" contains the already installed macros. I copied the downloaded "MyMacros" folder into the DataWarrior macro folder.

Screenshot 2021-03-29 at 10.06.06

Now when you restart DataWarrior the macros are always available.

Screenshot

If you are not familiar with DataWarrior the RSC CICAG has hosted a number of workshops for key Open-Source chem software, these workshops are now all available on YouTube.

DataWarrior https://youtu.be/Is2hLqqSFvM
PyMOL https://youtu.be/qOxS2wqajdg
GoogleCoLab https://youtu.be/KEIpJ50Jc0w
ChEMBL https://youtu.be/zpzJutFTtL4
Fragalysis https://youtu.be/LVWd50CgU4g
Knime https://youtu.be/lP0Yh6kKNsA
ChimeraX https://youtu.be/M2K72Kgk718

Last Updated 29 March 2021