Modin for distributed Pandas calculations
Modin is a library designed to accelerate Pandas by automatically distributing the computation across all of the system’s available CPU cores. Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. Modin is a DataFrame designed for datasets from 1MB to 1TB+
It can be installed using PIP
pip install modin
If you don't have Ray or Dask installed, you will need to install Modin with one of the targets:
pip install modin[ray] # Install Modin dependencies and Ray to run on Ray pip install modin[dask] # Install Modin dependencies and Dask to run on Dask pip install modin[all] # Install all of the above
Currently, Modin depends on pandas version 0.23.4.
I've added Modin to the Open Source Data Science Python Libraries.