Downloading from the RCSB Protein Data Bank using Python
The RCSB Protein Data Bank is an absolutely invaluable resource that provides archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps scientists understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. Currently the PDB contains over 134,000 data files containing structural information on 42547 distinct protein sequences of which 37600 are human sequences. They also provide a series of tools to search, view and analyse the data.
Downloading an individual pdf file is pretty trivial and can be done from the web page as shown in the image below. They also provide a Download Tool launched as stand-alone application using the Java Web Start protocol. The tool is downloaded locally and must be then opened. I've found this a little temperamental and had issues with Java versions and security settings.
Since I've been making extensive use of the web services to interact with RCSB I decided to explore the use of Python to download multiple files. I started off creating a Jupyter notebook using the web services provided by RCSB.
I've also used variations on this code to create a python script and a Vortex script.