Close menu Resources for... William & Mary
W&M menu close William & Mary

Python use on HPC clusters

GUST USERS:

There is a bug in the version of slurm running on gust that will not propogate conda info correctly to the nodes when running an interactive or batch job.   To fix this, please be sure to explicitly load your anaconda module within your batch script or interactive session.   Please send questions to [[hpc-help]] .

NOTE

Using this help relies on you understanding the use of software environment modules and what type of shell you are using.   You can use the command echo $0 to check which shell you use (tcsh or bash), e.g.:

>> echo $0
-tcsh

This shows that I use a tcsh shell for my environment (the vast majority of HPC users get tcsh)  

Introduction: Python and Anaconda

Python is an extermely popular programming language that is ubiquitous in research computing.  Depending on your use case, you may need to just run a simple python command or you may need to build a complex python environment with many separate, dependent python modules.  

Anaconda is a distribution of python that comes with its own package manager.  Anaconda (or Conda).  The package manager can be used to obtain python modules from multiple sources.  Just like usual python, conda also can create environments for independent sets of python modules.  We strongly recommend Anaconda for all HPC systems since, unlike python virtual environments, Anaconda will make a completely separate environment with no dependency on the OS python packages (which are quite old on most HPC systems).

Setting up your environment 

Before you begin computing on HPC, you must choose which subcluster you wish to use.  For the purposes of this documentation, we will use bora/hima subclusters (the bora and hima subclusters share the same front-end, namely, bora).   

To create an Anaconda environment for bora and/or hima, log into the bora front-end. For tcsh users, Anaconda can be added to your environment by loading an appropriate environment module:

module load anaconda/2021.05

FOR BASH users:

Unfortunately, Anaconda does not activate properly using the module system within the bash environment.   Therefore, after loading the anaconda environment module, one additional step is needed.  You will need to enter:

eval "$(conda shell.bash hook)"

On the command-line before you want to activate an Anaconda environment.

Unlike the Anaconda documentation, we don't recommend running 

conda init <shell> 

on the HPC cluster since this can disturb your regular shell environment.

Creating the Anaconda environment 

Once Anaconda is properly loaded, you can use:

conda create -n <env name>

to create your environment

The new environment can be activated using 

conda activate <env name>

once in the environment, you can use conda install to install packages (see conda doc)  or install pip and use pip to install packages (see NOTE).

To deactivate your current environment do:

conda deactivate

And this will return you to your usual shell environment.

NOTE on using pip in Anaconda environments.   Once inside an Anaconda enviroment, you can use:

conda install pip 

To install the pip package manager into your Anaconda environment.  We recommend using:

python -m pip install <python module name>

instead of the usual

pip install <python module name>

since the former will keep all pip installed modules in your Anaconda environment tree.

Using Python/Anaconda in your batch jobs 

For interactive batch jobs, the procedure is the same as above when working on a cluster front-end.   The procedure is similar for batch jobs and doesn't matter whether using Torque or Slurm.

For batch jobs, simply put the module load and conda activate commands in your batch script before running your python script: