Macs in Chemistry

Insanely Great Science

Tutorials

Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science

 

A timely tutorial and example workflow.

we have put together a more comprehensive workflow, serving as a blueprint for anyone to build her or his own version of a Guided Analytics application to combine just the right amount of automation and interaction for a specific set of problems.

Full details here


Comments

Tips & Tricks for Using KNIME

 

The Knime blog has a post containing lots of user submitted tips and tricks

Ever sat next to a friend or colleague at the computer and were awed when you suddenly realised the way they do certain tasks is much better? We recently asked KNIME users to share their tips and tricks on using KNIME. In this series of posts we’ll be showing you how the experts use KNIME in the hopes that by sharing ideas you’ll discover some handy techniques.


Comments

Updated Literature search script

 

I've updated the Vortex script to run text based queries of PubMed.

If you regularly use the E-utilities API you might want to read this.

After May 1, 2018, NCBI will limit your access to the E-utilities unless you have one of these keys. Obtaining an API key is quick, and simple, and will allow you to access NCBI data faster. If you don’t have an API key, E-utilities will still work, but you may be limited to fewer requests than allowed with an API key.

After May 1, 2018, any computer (IP address) that submits more than 3 E-utility requests per second will receive an error message. This limit applies to any combination of requests to EInfo, ESearch, ESummary, EFetch, ELink, EPost, ESpell, and EGquery.

If you write software of scripts that access the E-utilities API then the users will need to get their own api key. Calls will have this format

https://www.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed&api_key=ABCD123

I've updated this script to reflect this change, and I've highlighted where you need to add your api key in the script. I've also tried to ensure that any query string should be encoded to make it URL safe and I've extended the search range up to 2018.

AIsearchresults


Comments

Flagging Potential Kinase Inhibitors

 

Most of kinase inhibitors bind in the region of the ATP binding site using the hydrogen bonding interactions of the hinge region shown in the schematic below. We can use the knowledge of these hinge binding motifs to flag potential kinase inhibitors.

schematicatpbinding

Read more ….


Comments

Scripting PubMed searches

 

PubMed comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites. They also provide a number of programming tools that allow access to the information, E-utilities are a set of server-side programs that provide a stable interface into the Entrez query and database system.

To access these data, a piece of software first posts an E-utility URL to NCBI, then retrieves the results of this posting, after which it processes the data as required. The software can thus use any computer language that can send a URL to the E-utilities server and interpret the XML response; examples of such languages are Perl, Python, Java, and C++.

A while back I wrote a vortex script that helps with these sort of searches if you have multiple terms you want to search. I've updated this script to incorporate the changes requiring api keys to allow multiple requests to the E-utilities api, and I've highlighted where you need to add your own api key in the script. I've also tried to ensure that any query string should be encoded to make it URL safe.

The update is detailed more fully here….

tut25result


Comments

Interacting with the RCSB Protein Data Bank

 

The RCSB Protein Data Bank is an absolutely invaluable resource that provides archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps scientists understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. Currently the PDB contains over 134,000 data files containing structural information on 42547 distinct protein sequences of which 37600 are human sequences. They also provide a series of tools to search, view and analyse the data.

The latest addition to the Hints and Tutorials page is a couple of Vortex scripts for interacting with the RCSB Protein Data Bank, specifically they search for PDB structures associated with a list of Uniprot codes, and then search for associated information. Read more here…

Comments

Predicting sites of metabolism Vortex script

 

It is really useful to have two sites of metabolism tools available that use contrasting methodologies, FAME 2 using curated dataset of experimentally determined metabolism data to build a machine learning model using simple descriptors. In contrast SMARTCyp uses precomputed activation energies from density functional theory (DFT) calculations of model compounds.

I previously wrote a script displaying the [results of a SMARTCyp calculation in a webview. The first part of the script imports the smartcyp.jar, however with each update I was finding issues so I thought it might be better to simply treat SMARTCyp as a command line application and use subprocess to access it.

Using a similar script we can also access FAME2

More details here.

somprediction


Comments

Dealing with Greek characters in column names

 

This is just a very quick tip when dealing with Greek characters in Vortex column names when creating a script. It may be obvious to many but I struggled for several hours before finding the problem and a solution

Read more…


Comments

Flexible UniChem Search

 

UniChem is a web resource provided by the EBI, it is a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between multiple databases. Currently the UniChem contains data from 27 different data sources. Currently UniChem provides links to 108,941,995 structures.

Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI: http://dx.doi.org/10.1186/1758-2946-5-3

The previous script showed how to search using ChEMBLID, however one of the attractions of UniChem is that you can search with any molecule identifier if you know the corresponding datasource. This script allows the user to use any molecule identifiers and then search a specified datasource using a common web service.

Read more …


Comments

Getting UniChem data from ChEMBL

 

UniChem is a web resource provided by the EBI, it is a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between multiple databases. Currently the UniChem contains data from 27 different data sources. Currently UniChem provides links to 108,941,995 structures.

Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI: http://dx.doi.org/10.1186/1758-2946-5-3

ChEMBL also provide a RESTful Web service that users can use to retrieve data from the UniChem database in a programmatic fashion.

Read more…


Comments

Installing Checkmol/Matchmol under Mac OSX

 

Checkmol is a command-line utility program which reads molecular structure files in different formats and analyzes the input molecule for the presence of various functional groups and structural elements. At present, approx. 200 different functional groups are recognized. Output can be either clear text (English or German), a bitstring or its ASCII representation, or a set of special 8-character codes. This output can be easily placed into a database table, permitting the creation of chemical databases with a functional group search option. It was written by Norbert Haider, Department of Pharmaceutical Chemistry (now: Department of Drug and Natural Product Synthesis), University of Vienna, Austria.

The software is available both as source code and as a binary compiled for Linux (x86 architecture). It is entirely written in Pascal and it was compiled with Free Pascal 1.0.11 or Free Pascal 2.4.0 (starting from v0.4c). So to install we first need to get a Pascal compiler, this can be downloaded from Sourceforge.

Full details are here.

Comments

Importing Open Source Malaria Project data

 

The Open Source Malaria project is trying a different approach to curing malaria. Guided by open source principles, everything is open and anyone can contribute. To date a lot of people around the world have made contributions and the project is at a very exciting stage. Whilst everyone can see the compounds that have been made and the biological data, it is often spread over multiple web pages and can be tricky to link molecule with identifier with data. Over the last couple of months a significant effort has been put into populating a spreadsheet with all the information.

Whilst this is useful for viewing results it is not ideal for trying to build predictive models. Vortex is a chemically intelligent data analysis and visualisation platform. This script provides a one-click access to the OSM data and creates a workspace containing all the data, and since it is linked to the live spreadsheet you will always have access to the latest data.

osmvortex

Comments

Scripting Vortex 25

 

Whilst most of the Vortex scripts mentioned on this site to date involve chemical structures we should not forget that Vortex is an excellent general data analytics tool and the data set does not have to include any molecular structures. Recently I was asked about the number of publications associated with a particular potential therapeutic target and it struck me that Vortex might actually be an excellent tool to investigate this.

Read More.

vorte25_1

Comments

Cheminformatics on a Mac

 

A little while back I wrote a detailed tutorial for getting a wide variety of cheminformatics tools running on a Mac.

Someone just let me know about an issue with OSRA a utility designed to convert graphical representations of chemical structures, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES

It turns out that OSRA requires ghostscript to process pdf images, this can be installed using brew.

brew install ghostscript
Comments

MedChem Wizard KNIME workflow

 

The MedChemWizard is a KNIME workflow designed to assist medicinal chemists with idea generation, ligand design and lead optimization using a number of common functional group transformations and medchem rules-of-thumb, this tutorial provided by Dr. Alastair Donald gives a detailed description of it's use.

mcwizard

Comments

BBEdit tutorial

 

I'm a long time BBEdit user but I still enjoy reading tips for making your use of BBEdit more efficient.

This blog post offers some tips for the various "Find" options within BBEdit.

I'd certainly agree with the final comment.

Text editors with limited capabilities keep you at a beginner level, no matter how long you've been using them. Serious text editors have a depth that rewards their users.

Comments

Bringing Open Source to Drug Discovery

 

I gave a talk at the RSC 25th Symposium on Medicinal Chemistry in Eastern England meeting last week entitled “Bringing Open Source to Drug Discovery”.

The slides and pages of links are available here.

I also captured the laptop screen of the demo which I’ve now put on YouTube.

https://www.youtube.com/watch?v=sG9vDIfp0NE&feature=youtu.be

The aim was to show what was available and to show how they can be integrated into proprietary tools using scripting, many of the scripts are available on the hints and tutorials page.


Comments

Scripting Vortex 12

In the previous tutorial we made use of the Virtual Computational Chemistry Laboratory web service to calculate aLogP and LogS, both these results were returned in a simple text format. More recently there has been an increased use of JSON format for data exchange.

JSON, or JavaScript Object Notation, is a text-based open standard designed for easy human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is language-independent, with parsers available for many languages including including C, C++, C#, Java, JavaScript, Perl, Python.

Molinspiration provide a number of cheminformatics tools but also provide a RESTful web service these web services can be used to calculate a range of molecular properties and bioactivity predictions.

The output from both web services is available either as a JSON string or plain text, the web service can be accessed by submitting a URL

Full details of the script are here.

vortex1



Comments

Scripting Vortex:- Accessing a web service

I’ve just added the latest script for Vortex.

In previous scripts we have generated data using a local Java program, C program, PERL script, and SVL program. In this tutorial rather than have a local application generate the data we will use a web service.

mols

There are more scripts on the Hints and Tutorial pages.



Comments

ChemDoodle, WebGL and Protein Ribbons

A tutorial demonstrating the use of ChemDoodle and WebGL to display ribbons on proteins. Read More...
Comments