W&M's HPC systems have several types of user filesystems, intended for different purposes. These can be grouped into two broad categories:
- Global filesystems are identified with the prefix
/ches/and are accessible on every node within SciClone or Chesapeake, respectively.
- Local filesystems begin with the prefix
/local/and are physically resident on one or more disks directly attached to a node. Since operations on local filesystems never have to travel over one of the cluster's networks, and are less subject to competition from other users' jobs, they may provide better, more consistent performance than global filesystems, but their contents are only accessible on the local node.
An additional distinction is made between home, data, and scratch filesystems, by their intended use and accordant backup policy:
||Source code, executables, configuration files, scripts, and small (<1GB total) data files. Unless you have been directed otherwise, you should not have a job read or write any substantial amount of data to your home directory, as doing so is extremely likely to impact others' interactive work.||
Weeknightly, on-site only
|After account expiration.||
Data that are needed on an ongoing basis for active projects on the cluster and cannot be easily re-created or re-uploaded.
Please do not have batch jobs write a substantial amount to data filesystems. Please use the scratch filesystems for job output unless already given permission from HPC staff.
||Scratch space: job outputs and working data that can be easily re-created or re-uploaded, or which will be copied elsewhere for longer-term storage.||Never||Any files not accessed for 90 days, and after account expiration.|
When a user account is installed, a home directory for that user is created on each cluster (SciClone and Chesapeake) in one of its home filesystems. Additionally, subdirectories with the user's login name are created in each data, global scratch, and local scratch filesystem. As a convenience, symlinks in each user's home directory point to the preconfigured user directories. After a user's account has expired, all of these directories become subject to deletion.
Due to SciClone and Chesapeake's heterogeneity and resulting network topology, there are asymmetries in how efficient it is to use a global filesystem from different nodes. For example, accessing
/sciclone/scr20 from a Hurricane or Whirlwind node passes through only one switch, but from a Vortex or Bora node passes thorough two switches, and from a Hail or Wind node passes through three switches, increasing latency and the potential for bottlenecks at the links between switches.
In a more extreme example, while Meltemi nodes have a 100 Gb/s Omni-Path connection to
/sciclone/scr-mlt, all other SciClone nodes can access
/sciclone/scr-mlt only over 1 Gb/s Ethernet.
On the subcluster pages, we make recommendations as to which filesystems are likely to work best with a particular subcluster. That said, we do make filesystems available globally, even on subclusters from which access is (usually only slightly) less efficient, so that users can exercise judgment in determining where to place data needed for work on multiple subclusters, and to avoid fragmenting our storage capacity.
Global filesystem names have a two-digit suffix (e.g.,
/sciclone/data10) which serves not only to distinguish it from other filesystems of the same type, but also to indicate the underlying storage architecture. Suffixes which begin with a "0" typically indicate a single internal disk drive within a server, while those beginning with "1", "2", etc. indicate a filesystem that spans one or more disk arrays, each consisting of multiple drives, usually in a RAID configuration. This allows users to easily distinguish array-based filesystems, which are larger and faster, from their single-drive counterparts.
Local scratch filesystems labeled
/local/scrX used to provide the ability to distinguish between local scratch on different disk drives, enabling IO-intensive applications which place files in different filesystems to minimize head movement. Nowadays, however, for ease of use, if a compute node has more than one local scratch disk, we generally stripe one
/local/scr filesystem across all of them.
All backups remain on the same campus as the corresponding cluster. Chesapeake's backups are on a file server that is part of Chesapeake and is in the same room as the rest of Chesapeake. SciClone writes its backups to a tape library in another building, but on the same campus, less than half a mile away. Both schemes protect against accidental deletions, filesystem corruption, and hardware failures, and SciClone's additionally protects against loss (e.g. in a fire) of the room and building housing the cluster, but neither scheme is as secure as off-line, off-site backups would be. If your data require off-site backup, you must provide for it yourself.
Furthermore, capacity allows us to keep only about one month of backups, so it is important that you let us know as soon as possible if you need something restored. If your data need protection against loss that remains undiscovered for longer, you should make at least one additional backup of your own.
From time to time, additional "project" filesystems may be provisioned for specific projects or research groups, e.g.
Several system filesystems are also present throughout the clusters.
/tmp are local to individual nodes;
/import are hosted on the respective platform servers and exported to their client nodes via NFS. Note that on our systems, the
/tmp filesystem is of very limited size, its public permissions leave files relatively unsecured, and its contents are often wiped clean on a reboot. Users should not explicitly store files in
/local/scr/$USER instead. The default login scripts set your
TMPDIR environment variable accordingly.