Bora and Hima are subclusters of SciClone with Intel Xeon "Broadwell" processors, the former intended for multi-node parallel jobs, and the latter intended for serial and shared-memory jobs. Their front-end is
bora.sciclone.wm.edu and they share the same startup modules file
.cshrc.el7-xeon. They are also the first subclusters to utilize the new parallel file-system mounted at
(bora / bo00)
|Model||Dell PowerEdge R630||Dell PowerEdge R730|
Intel Xeon E5-2640 v4
Intel Xeon E5-2683 v4
|Memory||64 GB||128 GB||256 GB|
FDR IB (bo00-ib)
FDR IB (bo??-ib)
QDR IB (hi??-ib)
10 GbE (bo00)
1 GbE (bo??)
1 GbE (hi??)
All access to compute nodes (for either interactive or batch work) is via the TORQUE resource manager, as described elsewhere. TORQUE assigns jobs to a particular set of processors so that jobs do not interfere with each other.
All Bora and Hima nodes have Intel's Hyper-Threading enabled; however, since Bora is intended for MPI parallel jobs, the TORQUE parameter
np is set to 20, the total number of physical processors, and the nodes are configured to only run one job at a time. Therefore, users should request all 20 cores per node. Hyperthreading can still be accessed via OpenMP jobs by requesting the node as exclusive (#PBS -W x=\"NACCESSPOLICY:SINGLEJOB\") and setting the number of threads to 2 per MPI process (20 MPI processes each using 2 threads = 40 threads per node) see running parallel jobs with mvp2run for more information. Beware that hyperthreading often slows down individual jobs, please test your code well before doing production runs.
For Hima, the TORQUE parameter
np is set to 64, one per each logical processor. This is to allow users to run up to 64 processors worth of serial or shared memory jobs on one node in order to maximize throughput (not necessarily individual job speed). If you do not wish to use hyperthreading on Hima nor share the node with other users, you should request ppn=32 and request the node as exclusive (#PBS -W x=\"NACCESSPOLICY:SINGLEJOB\").
Again, all Bora nodes, which are intended for multi-node parallel jobs, are configured to run at most one job, and users should take all 20 physical cores per node for all jobs. The nodes have the following TORQUE properties:
bora, broadwell, c21, el7, compute
Since only one job can occupy a single Bora node at one time, the following node specs are sufficient for a pure MPI job:
#PBS -n -l nodes=1:bora:ppn=20 (20 cores on a single node)
#PBS -n -l nodes=4:bora:ppn=20 (80 cores across four nodes)
This also works for correct processor placement since cores # 0-19 are the physical cores.
Since jobs that use less than all 20 cores on Bora would still occupy the whole node and make the other cores and threads inaccessible to other users, such jobs should instead use Hima nodes, which allow multiple simultaneous jobs.
Hima nodes have the TORQUE properties
c22. Specify, for example, in your job script:
#PBS -l nodes=1:hima:ppn=1
Of the 7 Hima nodes that are currently available, 2 have GPUs installed. In order to keep these 2 nodes available for GPU use, the 5 Hima nodes without GPUs have an additional TORQUE property
nogpu. For example:
#PBS -l nodes=1:hima:nogpu:ppn=1
This node spec will only run on the Hima nodes without GPUs. Specifying the
nogpu keyword is much more fair to the GPU users. We reserve the right to add this keyword to any Hima job which does not require a GPU.
Torque time limit
The maximum walltime for jobs on Bora and Hima is 72 hours. Please be careful about this limit since currently jobs that request more than 72 hours on Bora or Hima nodes will simply remain queued with no other information provided to the user. We will try to modify this behavior in the future.
To login, use SSH from any host on the William & Mary or VIMS networks and connect to
bora.sciclone.wm.edu with your HPC username (usually the same as your WMuserid) and W&M password.
Your home directory on Bora and Hima is the same as everywhere else on SciClone, and all of the usual filesystems (/sciclone/homeXX, /sciclone/dataXX, /sciclone/scrXX, /local/scr, etc.) are available throughout the Bora and Hima subclusters. Additionally, the parallel filesystem
/sciclone/pscr is available.
SciClone uses Environment Modules (a.k.a Modules) to automatically configure the user's shell environment across multiple computing platforms, as well as to organize the dozens of different software packages which are available on the system. We support tcsh as the primary shell environment for user accounts and applications.
The file which controls startup modules for Bora and Hima is
.cshrc.el7-xeon. The most recent version of this file can be found in
/usr/local/etc/templates on any of the front-end servers (including
The preferred file system for all work on Bora is the parallel scratch file system available at
/sciclone/pscr/$USER on the front-end and compute nodes.
/sciclone/scr10/$USER is a good alternative (NFS, but connected to the same InfiniBand switch).
On Hima, the preferred filesystem is
/local/scr, which on Hima nodes is much larger and faster than the
/local/scr on most other nodes. We intend to rectify this in the future, but Hima presently shares its link to the FDR-hosted global filesystems (
baby10) with Whirlwind and Hurricane, so you may get more consistent performance from
Bora and Hima have the Intel Parallel Studio XE 2017 compiler suite as well as version 4.9.4 of the GNU compiler suite. Here are suggested compiler flags which should result in fairly optimized code on their Broadwell architecture (taken from http://www.prace-ri.eu/IMG/pdf/Best-Practice-Guide-Haswell.pdf):
|Intel||C||icc -O3 -xCORE-AVX2 -fma -align -finline-functions|
|C++||icpc -std=c11 -O3 -xCORE-AVX2 -fma -align -finline-functions|
|Fortran||ifort -O3 -xCORE-AVX2 -fma -align array64byte -finline-functions|
|GNU||C||gcc -march=broadwell -O3 -mfma -malign-data=cacheline -finline-functions|
|C++||g++ -std=c11 -march=broadwell -O3 -mfma -malign-data=cacheline -finline-functions|
|Fortran||gfortran -march=broadwell -O3 -mfma -malign-data=cacheline -finline-functions|
Currently there are three versions of MPI available on the Bora subcluster:
openmpi (v2.1.1), intel-mpi (v 2016 and 2017) and
mvapich2-ib (v2.2). Both of these should be used through the
mvp2run wrapper script. OpenMPI can launch both pure MPI and hybrid MPI/OpenMP jobs, while MVAPICH2 can only launch pure MPI jobs. For pure MPI jobs, the syntax for both versions of MPI are the same:
#!/bin/tcsh #PBS -N MPI #PBS -l nodes=5:bora:ppn=20 #PBS -l walltime=12:00:00 #PBS -j oe cd $PBS_O_WORKDIR mvp2run ./a.out >& LOG
Hima nodes hi04-hi07 are now each equipped with one Tesla style GPU. Nodes hi04 and hi05 have an Nvidia Telsa P100 while hi06 and hi07 have a V100. All GPUs have 16GB of memory. All nodes have Cuda v9.1 installed.
Hima nodes with a GPU can be specified in the Torque batch system by using the following node specs:
-l nodes=1:hima:v100:ppn=64 # to select a hima node with a v100 GPU
-l nodes=1:hima:p100:ppn=64 # to select a hima node with a p100 GPU
-l nodes=1:hima:gpu:ppn=64 # to select a hima node with either a p100 or v100 GPU
Since only one user at a time can access the GPU, we suggest that users take the whole hima node (i.e. ppn=64) if they plan to use it.
Please send email to email@example.com if you have questions about setting up jobs or installing software to take advantage of the Hima GPUs.