Using the Python 3 library fpsim2 for similarity searches
FPSim2 is a new tool for fast similarity search on big compound datasets (>100 million) being developed at ChEMBL. It was developed as a Python3 library to support either in memory or out-of-core fast similarity searches on such dataset sizes.
It is built using RDKit and can be installed using conda. It requires Python 3.6 and a recent version of RDKit..
I've written a couple of Jupyter notebooks to demonstrate it's use.
You can read the full tutorial here, and download the notebooks.