Macs in Chemistry

Insanely great science


Getting PDB information.

A while back I published two scripts that use UniChem a web resource provided by the EBI, a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between multiple databases.

Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI:

The first script uses the ChEMBL ID to search for other identifiers, the second script allows more flexible searching using any of the identifiers available within UnicChem. One of the identifiers returned is from the PDBe (Protein Data Bank Europe) and represents the ID of the ligand in the PDB. Whilst this is interesting it would also be very useful to have the identity of the crystal structures that contain the ligand. Fortunately PBDe provide a series of web services that can be used to interrogate the database, together with a really useful page to help build the calls.

For our needs the query format is-

and the data is returned as-


The first part of the script creates a dialog box allowing the user to identify the column containing the PDB ligand ID, then we work through the rows in the workspace to generate the query string. This is then submitted to the webservice and the data returned.

The data is returned in json format as shown above, since a ligand may be present in many PDB structures the result can consist of a list of PDB codes. We use "join" to convert to a comma separated list, then create and populate a new column.

j = json.loads(molecule_record)
pdbstring = ', '.join(j[PDBligand_id])
colPDBID = vtable.findColumnWithName('PDB_ID', 1)
colPDBID.setValueFromString(r, pdbstring)


The Vortex Script

#Script to get a list of PDB entries that contain the compound defined in the PDB Chemical Component Dictionary, (from unichem search)

# Python imports
import urllib2
import urllib
from com.xhaus.jyson import JysonCodec as json

# Vortex imports
import com.dotmatics.vortex.util.Util as Util
import com.dotmatics.vortex.mol2img.jni.genImage as genImage
import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img
import jarray
import binascii
import string
import os

input_label = swing.JLabel("PDB (for input)")
input_cb = workspace.getColumnComboBox()
panel = swing.JPanel()

layout.fill(panel, input_label, 0, 0)
layout.fill(panel, input_cb,    1, 0)

ret = vortex.showInDialog(panel, "Choose PDB ligand column")

if ret == vortex.OK:
    input_idx = input_cb.getSelectedIndex()

    if input_idx == 0:
        vortex.alert("you must choose a column")
        chosen_col = vtable.getColumn(input_idx - 1)

        rows = vtable.getRealRowCount()
        for r in range(0, int(rows)):
            PDBligand_id = chosen_col.getValueAsString(r)

            api_url = '' % PDBligand_id
                molecule_record = urllib2.urlopen(api_url).read()
            except urllib2.HTTPError:
            j = json.loads(molecule_record)
            pdbstring = ', '.join(j[PDBligand_id])
            colPDBID = vtable.findColumnWithName('PDB_ID', 1)
            colPDBID.setValueFromString(r, pdbstring)

The script can be downloaded from here

Last Updated 31 May 2017