Bora / Hima

Bora and Hima are subclusters of SciClone with Intel Xeon "Broadwell" processors, the former intended for multi-node parallel jobs, and the latter intended for serial and shared-memory jobs. Their front-end is bora.sciclone.wm.edu and they share the same startup modules file .cshrc.el7-xeon. They are also the first subclusters to utilize the new parallel file-system mounted at /sciclone/pscr/$USER.

Hardware
Torque Node Specfiers
User Environment
Preferred File Systems
Compiler Flags
MPI
Hima GPUs

Hardware

		Front-end (bora / bo00)	Parallel nodes (bo01-bo55)	Serial/shared-memory nodes (hi01-hi07)
Model		Dell PowerEdge R630		Dell PowerEdge R730
Processor(s)		2×10-core Intel Xeon E5-2640 v4		2×16-core Intel Xeon E5-2683 v4
Clock speed		2.4 GHz		2.1 GHz
Memory		64 GB	128 GB	256 GB
Network interfaces	Application	FDR IB (bo00-ib)	FDR IB (bo??-ib)	QDR IB (hi??-ib)
Network interfaces	System	10 GbE (bo00)	1 GbE (bo??)	1 GbE (hi??)
OS		CentOS 7.3

Torque Node Specifiers:

All access to compute nodes (for either interactive or batch work) is via the TORQUE resource manager, as described elsewhere. TORQUE assigns jobs to a particular set of processors so that jobs do not interfere with each other.

All Bora nodes, which are intended for multi-node parallel jobs, are configured to run at most one job, and users should take all 20 physical cores per node for all jobs. The only exception would be for parallel jobs which can't use all cores for memory reasons. The nodes have the following TORQUE properties:

bora, broadwell, c21, el7, compute

Since only one job can occupy a single Bora node at one time, the following node specs are sufficient for a pure MPI job:

#PBS -n -l nodes=1:bora:ppn=20 (20 cores on a single node)
#PBS -n -l nodes=4:bora:ppn=20 (80 cores across four nodes)

This also works for correct processor placement since cores # 0-19 are the physical cores.

Since jobs that use less than all 20 cores on Bora would still occupy the whole node and make the other cores inaccessible to other users, such jobs should instead use Hima nodes, which allow multiple simultaneous jobs.

Hima nodes have the TORQUE properties hima, broadwell, and c22. Specify, for example, in your job script:

#PBS -l nodes=1:hima:ppn=1

Specifying Hima nodes without GPUs

Of the 7 Hima nodes that are currently available, 2 have GPUs installed. In order to keep these 2 nodes available for GPU use, the 5 Hima nodes without GPUs have an additional TORQUE property nogpu. For example:

#PBS -l nodes=1:hima:nogpu:ppn=1

This node spec will only run on the Hima nodes without GPUs. Specifying the nogpu keyword is much more fair to the GPU users. We reserve the right to add this keyword to any Hima job which does not require a GPU.

Torque time limit

The maximum walltime for jobs on Bora and Hima is 72 hours. Please be careful about this limit since currently jobs that request more than 72 hours on Bora or Hima nodes will simply remain queued with no other information provided to the user. We will try to modify this behavior in the future.

User Environment

To login, use SSH from any host on the William & Mary or VIMS networks and connect to bora.sciclone.wm.edu with your HPC username (usually the same as your WMuserid) and W&M password.

Your home directory on Bora and Hima is the same as everywhere else on SciClone, and all of the usual filesystems (/sciclone/homeXX, /sciclone/dataXX, /sciclone/scrXX, /local/scr, etc.) are available throughout the Bora and Hima subclusters. Additionally, the parallel filesystem /sciclone/pscr is available.

SciClone uses Environment Modules (a.k.a Modules) to automatically configure the user's shell environment across multiple computing platforms, as well as to organize the dozens of different software packages which are available on the system. We support tcsh as the primary shell environment for user accounts and applications.

The file which controls startup modules for Bora and Hima is .cshrc.el7-xeon. The most recent version of this file can be found in /usr/local/etc/templates on any of the front-end servers (including bora.sciclone.wm.edu).

Preferred filesystems

The preferred file system for all work on Bora is the parallel scratch file system available at /sciclone/pscr/$USER on the front-end and compute nodes. /sciclone/scr10/$USER is a good alternative (NFS, but connected to the same InfiniBand switch).

On Hima, the preferred filesystem is /local/scr, which on Hima nodes is much larger and faster than the /local/scr on most other nodes. We intend to rectify this in the future, but Hima presently shares its link to the FDR-hosted global filesystems (/sciclone/pscr, scr10, data10, aiddata10, and baby10) with Whirlwind and Hurricane, so you may get more consistent performance from /sciclone/scr20.

Compiler flags

Bora and Hima have the Intel Parallel Studio XE 2017 compiler suite as well as version 4.9.4 of the GNU compiler suite. Here are suggested compiler flags which should result in fairly optimized code on their Broadwell architecture (taken from http://www.prace-ri.eu/IMG/pdf/Best-Practice-Guide-Haswell.pdf):

Intel	C	icc -O3 -xCORE-AVX2 -fma -align -finline-functions
	C++	icpc -std=c11 -O3 -xCORE-AVX2 -fma -align -finline-functions
	Fortran	ifort -O3 -xCORE-AVX2 -fma -align array64byte -finline-functions
GNU	C	gcc -march=broadwell -O3 -mfma -malign-data=cacheline -finline-functions
	C++	g++ -std=c11 -march=broadwell -O3 -mfma -malign-data=cacheline -finline-functions
	Fortran	gfortran -march=broadwell -O3 -mfma -malign-data=cacheline -finline-functions

MPI

Currently there are three versions of MPI available on the Bora subcluster: openmpi (v2.1.1), intel-mpi (v 2016 and 2017) and mvapich2-ib (v2.2). Both of these should be used through the mvp2run wrapper script. OpenMPI can launch both pure MPI and hybrid MPI/OpenMP jobs, while MVAPICH2 can only launch pure MPI jobs. For pure MPI jobs, the syntax for both versions of MPI are the same:

#!/bin/tcsh 
#PBS -N MPI 
#PBS -l nodes=5:bora:ppn=20 
#PBS -l walltime=12:00:00 
#PBS -j oe 

cd $PBS_O_WORKDIR 

mvp2run ./a.out >& LOG

Hima GPUs

Hima nodes hi04-hi07 are now each equipped with one Tesla style GPU. Nodes hi04 and hi05 have an Nvidia Telsa P100 while hi06 and hi07 have a V100. All GPUs have 16GB of memory. All nodes have Cuda v9.1 installed.

Hima nodes with a GPU can be specified in the Torque batch system by using the following node specs:

-l nodes=1:hima:v100:ppn=32 # to select a hima node with a v100 GPU

-l nodes=1:hima:p100:ppn=32 # to select a hima node with a p100 GPU

-l nodes=1:hima:gpu:ppn=32 # to select a hima node with either a p100 or v100 GPU

Since only one user at a time can access the GPU, we suggest that users take the whole hima node (i.e. ppn=32) if they plan to use it.

Please send email to hpc-help@wm.edu if you have questions about setting up jobs or installing software to take advantage of the Hima GPUs.