KNIME workshop on clustering molecules
The latest RSC CICAG Open-Source Tools for Chemistry Workshop is now on YouTube. RSC CICAG Open Source Tools for Chemistry :- Clustering using KNIME.
Clustering is an invaluable cheminformatics technique for subdividing a typically large compound collection into small groups of similar compounds. One of the advantages is that once clustered you can store the cluster identifiers and then refer to them later, this is particularly valuable when dealing with very large datasets. Clustering is often used in the analysis of high-throughput screening results, or the analysis of virtual screening or docking studies.
There is a comparison of clustering options here.
The previous 14 workshops are available on a CICAG YouTube playlist.
You can still register for next months workshops here https://www.eventbrite.com/e/open-source-tools-for-chemistry-workshops-tickets-156431429617.
RSC CICAG KNIME workshop materials
Just got this message from Greg Landrum
The workshop tomorrow will have a hands-on component. This isn't mandatory, but I think you'll get more from the workshop if you follow along with what we're doing in your own local copy of the workflow.
In order to get you started and make sure that you have all the KNIME pieces that you need installed, I created a space in the public KNIME hub for the workshop and have uploaded some introductory material there:
If you are logged into the KNIME hub (registration is free), you can download the workflow and data by simply using the download button:
Once you've downloaded the workflow package, you should be able to import everything into the KNIME Analytics Platform by double clicking the "ClusteringIntro.knar" file which is downloaded.
If for some reason you don't want to register, then you'll need to navigate to the page for the workflow and the Data folder, download everything individually, and import them into KNIME Analytics Platform manually. You should be able to find information online for how to do this, but I won't be able to help you with this during the workshop due to time constraints.
When you open the 01_Clustering workflow, you may be asked if you want to install missing extensions. Please do this in order to ensure that you have everything necessary to follow along during the workshop. Everything we install as part of this process is free and open source.
Shortly before the workshop starts I will share an additional workflow which we'll use for the main part of the workshop. I'll give you this link during the workshop.
Note that the sample workflows we are using were created with KNIME 4.4 (the version released this summer). For workshops like this we like to use recent versions of KNIME so that we can show you the newest features and capabilities. If you have an older version of KNIME things may or may not work correctly and you may have to replace one or more of the nodes with older equivalents.
You can download KNIME here https://www.knime.com/downloads
Workshop on Open-Source Tools for Chemistry
Just a couple of notes for software installs prior to the event for those attending the free online Workshop on Open-Source Tools for Chemistry 9-13 November 2020.
Monday 13-30 to 15-30 Cheminformatics and Data Analysis using DataWarrior (Isabelle Giraud)
DataWarrior can be downloaded from here http://www.openmolecules.org/datawarrior/download.html
The training files can all be downloaded from here
Monday 16 - 00 to 18-00 Molecular visualisation using Pymol (Garrett Morris)
Software to install:
PyMOL via Conda:
Conda: https://www.anaconda.com/distribution/
or Miniconda: https://docs.conda.io/en/latest/miniconda.html
https://anaconda.org/psi4/pymol or https://omicx.cc/2019/05/26/install-pymol-windows/
PyMOL via MacPorts:
http://www.ub.edu/cbdd/?q=content/installing-pymol-macports
% sudo port install tcl -corefoundation
% sudo port install tk -quartz
% sudo port install pymol
PyMOL from GitHub:
https://github.com/schrodinger/pymol-open-source
Tuesday 11 to 13-00 Chemistry in the cloud: leveraging Google Colab for quantum chemistry (Jan Jensen)
Participants should download Chrome and have a Google account
Participants should make sure they can access this page: https://bit.ly/37fIYbp.
Some basic degree of Python proficiency is required for the course
It would be great if participants could fill out this survey https://forms.gle/pjwsnJTb4X6QpiHK9 early enough to help me design the course
Wednesday 13-30 to 15-30 Accessing biological and chemical data in ChEMBL (Anna Gaulton)
Requires a modern web-browser (with javascript not blocked) such as Chrome/Safari
Thursday 16-00 to 18-00 Fragment based screening, XChem at Diamond (Rachel Skyner)
Requires Chrome web browser, if there is time Rachel would like to give an introduction to the new Python API, we can go through the installation at the workshop but you must have Anaconda installed.
Friday 11-00 to 13-00 An introduction to KNIME workflows (Greg Landrum)
Knime can be downloaded here https://www.knime.com/downloads
Registration This event will be free to attend but registration is required.
More details and registration can be found here https://www.rsc.org/events/detail/43180/workshop-on-open-source-tools-for-chemistry.
Last Updated 28 October 2020
Workshop on Open-Source Tools for Chemistry
All scientists working in chemistry need software tools for accessing, handling and storing chemical information, or performing molecular modelling and computational chemistry. There is now a wealth of open-source tools to help in these activities; however, many are not as well-known as commercial offerings. This workshop offers a unique opportunity for attendees to try out a range of open-source software packages for themselves with expert tuition in different aspects of chemistry.
The software packages will be presented over six two-hour sessions as follows:
09 November: 13.30 - 15.30 Cheminformatics and data analysis using Data Warrior (Isabelle Giraud) 09 November: 16.00 - 18.00 Molecular visualization using PyMOL (Garrett M Morris)
10 November: 11.00 - 13.00 Chemistry in the cloud: leveraging Google Colab for quantum chemistry (Jan Jensen)
11 November: 13.30 - 15.30 Accessing biological and chemical data in ChEMBL (Anna Gaulton)
12 November: 16.00 - 18.00 Fragment-based screening, XChem at Diamond (Rachael Skyner)
13 November: 11.00 - 13.00 Interactive and automated chemical data analysis with KNIME (Greg Landrum)
Registration This event will be free to attend but registration is required.
More details and registration can be found here https://www.rsc.org/events/detail/43180/workshop-on-open-source-tools-for-chemistry.
Online Events
The current global pandemic means that more events are moving online, here are details of a few that have been sent to me
Dotmatics User Symposium | Cambridge 2020 14th & 15th October Details and Registration.
KNIME Introduction to Working with Chemical Data October 12 - 16, 2020 details and registration.
Virtual RDKit UGM 6-8 October 2020 details and registration.
16th German Conference on Cheminformatics and EuroSAMPL Satellite Workshop 2-3 November 2020 details
Open Chemical Science 9 - 13 November 2020 details.
3D-e-Chem NLeSC project
This looks interesting 3D-e-Chem NLeSC project.
This project will develop technologies to improve the integration of ligand and protein data for structure-based prediction of protein-ligand selectivity and polypharmacology.
The project will use KNIME Analytics Platform to integrate the different technologies and datasets.
Data curation workflow
One of the most time-consuming parts of any data analysis is curating the input data prior to any model building. This Knime workflow is fully documented and described and as such is an invaluable starting point.
A semi-automated procedure is made available to support scientists in data preparation for modelling purposes. The procedure address:
- Automatic chemical data retrieval (i.e., SMILES) from different, orthogonal web based databases, by using two different identifiers, i.e. chemical name and CAS registration number. Records were scored based on the coherence of information retrieved from different web sources.
- Data curation procedure performed to top scored records. The procedure includes removal of inorganic and organometallic compounds and mixtures, neutralization of salts, removal of duplicates, checking of tautomeric forms.
- Standardization of chemical structures yielding to ready-to-use data for the development of QSARs.
LigandScout 4.3 released
Inte:Ligand have just announced the release of LigandScout 4.3.
The LigandScout software suite comprises the most user friendly molecular design tools available to chemists and modelers worldwide. The platform seamlessly integrates computational technology for designing, filtering, searching and prioritizing molecules for synthesis and biological assessment.
This is a significant update and expands LigandScout's molecular dynamics support. This update also now includes halogen binding as a new pharmacophoric element. In addition plotting has received an upgrade.
Furthermore, LigandScout 4.3 Expert introduces a completely new set of features summarized under the term Remote Execution. It is now possible to screen large compound libraries on remote High Performance Computing directly from within the graphical LigandScout user interface.
It can be downloaded here http://www.inteligand.com/ligandscout4/downloads/LigandScout43macos20181012.dmg
You can read about the technology behind LigandScout here DOI and there is a review of an earlier version here.
In addition there are now over 40 LigandScout nodes for KNIME.
KNIME Analytics Platform is the open source software for creating data science applications, workflows and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.
REALizer KNIME workflow from BioSolveIT
BioSolveIT have added to their collection of KNIME workflows.
The "REALizer" helps you to post-process the results from searches in the REAL Space, leading you to those compounds of biggest interest.
Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science
A timely tutorial and example workflow.
we have put together a more comprehensive workflow, serving as a blueprint for anyone to build her or his own version of a Guided Analytics application to combine just the right amount of automation and interaction for a specific set of problems.
KNIME update
What’s New in KNIME Analytics Platform 3.6.
- KNIME Deep Learning
- Constant Value Column Filter
- Numeric Outliers
- Column Expressions
- Scorer (JavaScript)
- Git Nodes
- Call Workflow (Table Based)
- KNIME Server Connection
- Text Processing
- Usability Improvements
- Connect/Unconnect nodes using keyboard shortcuts
- Zooming
- Replacing and connecting nodes with node drop
- Node repository search
- Usability improvements in the KNIME Explorer
- Copy from/Paste to JavaScript Table view/editor
- Miscellaneous
- Performance: Column Store (Preview)
- Making views beautiful: CSS changes
- KNIME Big Data Extensions
- Create Local Big Data Environment
- KNIME H2O Sparkling Water Integration
- Support for Apache Spark v2.3
- Big Data File Handling Nodes (Parquet/ORC)
- Spark PCA
- Spark Pivot
- Frequent Item Sets and Association Rules
- Previews
- Create Spark Context via Livy
- Database Integration
- Apache Kafka Integration
KNIME Server
Management (Client Preferences)
- Job View (Preview)
- Distributed Executors (Preview)
General release notes
JSON Path library update
- Java Snippet Bundle Imports
I suspect it will be the KNIME Deep learning that will catch the eye, the ability to set up deep learning models using drag and drop. Use regular Tensorflow models within KNIME Analytics Platform and seamlessly convert from Keras to Tensorflow for efficient network execution
The new Create Local Big Data Environment node creates a fully functional local big data environment including Apache Spark, Apache Hive and HDFS. It allows you to try out the nodes of the KNIME Big Data Extensions without a Hadoop cluster.
Tips & Tricks for Using KNIME
The Knime blog has a post containing lots of user submitted tips and tricks
Ever sat next to a friend or colleague at the computer and were awed when you suddenly realised the way they do certain tasks is much better? We recently asked KNIME users to share their tips and tricks on using KNIME. In this series of posts we’ll be showing you how the experts use KNIME in the hopes that by sharing ideas you’ll discover some handy techniques.
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Greg Landrum's ICCS 2018 presentation on slideshare
KNIME tutorial
Don't forget to sign up for your chance to hear a webinar by Greg Landrum, Knime's VP for Life Sciences, this Wednesday, He will be talking about processing malaria HTS results using Knime and will give a tutorial on workflows developed for ligand-based virtual screening, based on results of a phenotypic HTS against malaria.
Wed, Feb 21, 2018 3:00 PM - 4:00 PM GMT
MedChem Wizard KNIME workflow
The MedChemWizard is a KNIME workflow designed to assist medicinal chemists with idea generation, ligand design and lead optimization using a number of common functional group transformations and medchem rules-of-thumb, this tutorial provided by Dr. Alastair Donald gives a detailed description of it's use.
Workflow tools
KNIME 2.7 released
KNIME 2.7 has been released.
KNIME now runs on Java 7 for Windows and Linux systems (Mac stays on Java 6) Eclipse update 3.7 increases stability on Mac and some Linux systems. BIRT 3.7 brings Open Office support among other new features
JFreeChart nodes have now more setting options in the “General Plot Options” tab of their configuration window.
In R-> Local there are a number of new nodes to import:
- “Table to R” can read a KNIME table into R and output the R workspace.
- “R to Table” takes an R workspace and outputs a KNIME table.
- “R +Data to R” takes an R workspace and optional data input and outputs an R workspace.
- “R to R-View” takes an R workspace and outputs a KNIME view
There is a KNIME tutorial here