Using the W&M k8s research cluster

Using the W&M kubernetes cluster

Using the W&M Kubernetes (K8s) cluster

NOTE: You must request access to the k8s cluster via hpchelp once you have your hpc account.

Overview
Namespaces
Pods and Jobs
- pods
- jobs
kubectl and other commands
Storage volumes and accessing NFS filesystems
Getting images for pods

Overview

Access: The front-end/login server for the RC kubernetes cluster is cm.geo.sciclone.wm.edu. You must be logged into this server to access the kubernetes cluster.

The Research Computing (RC) Kubernetes cluster is a newer resource compared to the traditional HPC/Slurm batch cluster. It was created to support research workflows that benefit from containerized environments, reproducibility, and flexible scaling. Singluarity based Containers can be used on the W&M HPC/Slurm batch clusters, however, this is the minority of jobs. In Kubernetes, containers are used in all workloads.

Both systems let you run scripts to launch workloads, but the style and structure differ:

HPC/Slurm: Jobs are typically submitted via shell scripts with directives (e.g., #SBATCH flags) that define resources and runtime behavior.
Kubernetes: Workloads are defined in YAML files, which describe the desired state of your application (resources, containers, and policies).

In short: HPC uses directives in scripts, while Kubernetes uses declarative configuration.

Why Use Kubernetes?

Containerization: Run software in reproducible environments without worrying about module systems or dependencies.
Scalability: Spin up multiple pods/jobs easily for parallel workloads.
Flexibility: Good for workflows involving services (databases, dashboards, ML model serving) that HPC batch queues don’t handle well.

Namespaces

Namespaces in Kubernetes provide a way to organize and isolate workloads, so different research groups or applications don’t interfere with each other. Namespaces is just an abstraction of a username. In fact most namespaces in our k8s cluster are just usernames. In our k8s cluster there are two types of namespaces:

User namespaces

Named after your username (e.g., jdoe).
Intended for short running jobs (max walltime must be less than 5 days)
Restriction: You may only run Jobs here — pods are not allowed.

Project namespaces

Shared by a group of users working on the same project.
Suitable for production or collaborative workloads.
You can run both Pods and Jobs here and pods have no restriction on walltime.

This separation helps ensure that testing and one-off jobs don’t interfere with shared project workloads.

Pods vs. Jobs

In most Kubernetes clusters:

Pods are the fundamental unit. A pod usually runs one container (though it can run multiple tightly coupled ones). Think of a pod as “one compute job” on HPC, but without a built-in runtime limit.
Jobs in Kubernetes are a higher-level object that manages pods. They ensure that a task runs to completion—restarting pods if necessary. A Job is closer to an HPC batch job, where the system ensures your work finishes, even if something fails mid-way.

Key differences to note:

Like the Slurm/Batch clusters, jobs have a maximum walltime (currently 5 days for k8s jobs)
In Slurm, the scheduler directly allocates nodes/cores. In Kubernetes, you request resources (CPU, memory, GPU) via YAML, and the scheduler fits your pod/job onto available nodes.

Here are two examples that do the same work. The first is a pod example, second, a job example. These two objects do nearly the same thing:

Creates a Pod named python-pod in the project1 namespace.
Runs a single container named python.
Pulls the image laudio/pyodbc (default registry: docker.io) available on: https://github.com/laudio/pyodbc
Starts the container with /bin/sh -ec "sleep 300" (execute command string; exit on any error).
Sets restartPolicy: OnFailure (restart only if the container exits non-zero).
Runs processes as UID 1719 and GID 1121 via Pod securityContext (use the Linux command id to find your UID and GID)
Requests at least 8 GiB RAM, 2 CPU cores, and 1 GPU (for scheduling).
Limits the container to 16 GiB RAM, 4 CPU cores, and 1 GPU (hard caps) (PLEASE DON'T REQUEST a GPU if you don't need one)
Mounts NFS volume data10 from server lunar path /sciclone/data10/ewalter to /tmp/mydata in the container.
Mounts NFS volume scr10 from server scr10 path /sciclone/scr10/ewalter to /tmp/my10 in the container.
Declares both NFS volumes under spec.volumes and references them under volumeMounts.

Example pod:

(click here to download text version)
Example of a pod.yml file for k8s

For the job example, here is a summary of how it differs from a pod definition:

Uses kind: Job (apiVersion: batch/v1) instead of a Pod; a controller manages Pods to ensure completion.
Job wraps the Pod under spec.template (Pod spec lives inside a template).
Job adds activeDeadlineSeconds: 60 — kills the Job if it hasn’t finished within 60s.
Job adds ttlSecondsAfterFinished: 30 — automatically deletes Job resources ~30s after it finishes.
Pod name is generated from the Job (e.g., python-job-xxxxx) rather than being fixed like python-pod.
Job defaults to retrying failed Pods (backoff, up to a limit) until success or deadline; a plain Pod does not have Job-style retries.
Valid Pod restart policies for a Job are OnFailure or Never (and are set under template.spec), while a standalone Pod can use other patterns but won’t have Job semantics.
Job tracks completion status (succeeded/failed counts, conditions), which Pods alone don’t aggregate.

Example job:

(click here to download text version)

Example of a job.yml file for k8s

Using kubectl - the main command for k8s

In Kubernetes, almost everything you do as a user goes through kubectl - it’s the command-line interface that talks to the Kubernetes API server.

To submit a yml file to run a pod (remember, only project namespaces can use pods):
[36 ewalter@cm ~ ]$kubectl apply -f webpod.yml
pod/python-pod created

To check the status of the pod:

[37 ewalter@cm ~ ]$kubectl get pods
NAME READY STATUS RESTARTS AGE
python-pod 1/1 Running 0 7s

To submit a yml file to run a job:
[36 ewalter@cm ~ ]$kubectl apply -f webjob.yml
job.batch/python-job created

To check the status of the job or the pod within the job:

[6 ewalter@cm ~ ]$kubectl get jobs
NAME COMPLETIONS DURATION AGE
python-job 0/1 43s 43s

[7 ewalter@cm ~ ]$kubectl get pods
NAME READY STATUS RESTARTS AGE
python-job-h2bdt 1/1 Running 0 45s

To open a bash shell within the pod:
[38 ewalter@cm ~ ]$kubectl exec -it python-pod-h2bdt -- /bin/bash

To delete the job or pod:
you can simpy kubectl delete -f the yml file which created the resource:

[62 ewalter@cm ~ ]$kubectl delete -f webjob.yml
job.batch "python-job" deleted

This will delete the job and any pods it spawned

Alternatively, you can kill individual jobs or pods from their name:

[63 ewalter@cm ~ ]$kubectl get pods
NAME READY STATUS RESTARTS AGE
python-job-5xftj 1/1 Running 0 54s

[64 ewalter@cm ~ ]$kubectl delete pod python-job-5xftj
[65 ewalter@cm ~ ]$kubectl get pods

NAME READY STATUS RESTARTS AGE
python-job-5xftj 1/1 Terminating 0 56s

kubectl describe
To get more detailed information on the status of the pod or job use the describe command. This can be useful to see why the pod isn't starting. As an example, lets change the volumes section of the job to try and mount a folder that does not exist:

volumes: # Define available volumes
- name: data10
nfs:
server: lunar
path: /sciclone/data1/ewalter # THIS PATH DOESN'T EXIST

Once the job is launched, you will see that the pod doesn't start:

[53 ewalter@cm ~ ]$kubectl get pod
NAME READY STATUS RESTARTS AGE
python-job-wbkxb 0/1 ContainerCreating 0 31s

And you can find out why if you describe the pod:

[73 ewalter@cm ~ ]$kubectl describe pod python-job-xr2rx
Name: python-job-xr2rx
Namespace: ewalter
Priority: 0
Service Account: default
Node: cdsw00.geo.sciclone.wm.edu/128.239.59.149
.
.
.
Events:

Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4s default-scheduler Successfully assigned ewalter/python-job-xr2rx to cdsw00.geo.sciclone.wm.edu
Warning FailedMount 0s (x4 over 4s) kubelet MountVolume.SetUp failed for volume "data10" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs lunar:/sciclone/data1/ewalter /var/lib/kubelet/pods/8cfa19d0-1833-461e-8e1a-1bf5d5153f2f/volumes/kubernetes.io~nfs/data10
Output: mount.nfs: access denied by server while mounting lunar:/sciclone/data1/ewalter

Which shows us that the pod couldn't start because the mount of lunar:/sciclone/data1/ewalter got permission denied (because the path doesn't exist).

kubectl logs:

If the pod starts it may still crash due to an other errors in the pod. For instance, if I make the command in the webjob.yml definition:

containers:
- name: python
image: laudio/pyodbc
command: ["/bin/sh", "-ec", "uknown_command"]

This pod will start, but will crash because the command unknown_command will not be found inside the pod:

[16 ewalter@cm ~ ]$kubectl get pod
NAME READY STATUS RESTARTS AGE
python-job-xng74 0/1 CrashLoopBackOff 1 (3s ago) 6s

To get more information about the pod, we can look at the pod logs:

[17 ewalter@cm ~ ]$kubectl logs python-job-xng74
/bin/sh: 1: unknown_command: not found

Which shows that the command unknown_command could not be found so the pod will crash.

Other useful commands

Besides kubectl, users can show information about the state of the k8s cluster and current GPU usage:

getnodestats

[90 ewalter@cm ~ ]$getnodestats
Data serialization date: 2026-01-20 17:10:19
Resources are in Allocated/Capacity format.
-----------+---------+-----------+---------+-------------------+------------------+------------+-------------------
Node | CPU | | MEM | GPU | GPU Type | Scheduable | Public | Infiniband
-----------+---------+-----------+---------+-------------------+------------------+------------+-------------------
gu07 | 1/32 | 1/125 | 0/2 | NVIDIA-A40 | True | True | True
gu08 | 17/32 | 61/125 | 1/2 | NVIDIA-A40 | True | True | True

This output lists all k8s nodes, their current CPU, Memory, and GPU allocations. Also listed is the GPU type, whether the node is current able to run pods/jobs, whether a general user can spawn on this node, and whether it is able to mount filesystems over Infiniband.

getgpuusage

Data serialization time: 2026-01-20 13:48:06
------------------+----------------------------+-------------------------+----------+----------+----------+----------------------
namespace | pod name | hours | node | #CPU | #GPU | GPU type
------------------+----------------------------+-------------------------+----------+------ ----+----------+----------------------
<user1> | python-job-p8wgq | 0 Days 20 Hours | gu07 | 16.0 | 2 | NVIDIA-A40
<user2> | coconut6-r75wh | 2 Days 4 Hours | gu08 | 16.0 | 1 | NVIDIA-A40

This output lists the current running pods on the system. Listed is the user running the pod, the name of the pod, how long it has been running, the node it is running on, the CPU and GPU resources being used, and the type of GPU

Accessing storage directories on K8s

NFS mount - users are able to mount any filesystems that are accessible from the Slurm batch cluster (excluding /sciclone/pscr) into their K8s/pods jobs.

NOTE: The scratch filesystems (scr10, scr20, scr30) should be used for job/pod outputs. home/comet has limited space and should be reserved for code and text files, data10/lunar should only be used to archive results after running jobs or to store often used data sets/large input files. The proj-ds filesystem is reserved for data-science projects and acceptable use is up to the Data Science school.

Examples for the major filesystems available to the k8s cluster, for mounting over 1Gb Ethernet:

volumes:
- name: home
nfs:
server: comet
path: /sciclone/home/ewalter
- name: data10
nfs:
server: lunar
path: /sciclone/data10/ewalter
- name: scr10
nfs:
server: scr10
path: /sciclone/scr10/ewalter
- name: scr20
nfs:
server: scr20
path: /sciclone/scr20/ewalter
- name: scr30
nfs:
server: scr30
path: /sciclone/scr30/ewalter
- name: proj nfs:
server: proj-ds
path: /sciclone/proj-ds/geograd/ewalter

This volumes section will allow all of a user's working directories to be mounted within the pod image.

For some nodes Infiniband is enabled which allows for a more performant mount of storage. The getnodestats command will list whether the node supports Infiniband mounts. To mount over infiniband, simply add the '-ib' suffix to the server name in the volumes section:

- name: scr30
nfs:
server: scr30
path: /sciclone/scr30/ewalter

An alternative to using NFS mounted storage is to request a Persisten Volume Claim which lives within the k8s global storage space via rook/ceph. Users must request a PVC to be created by the RC staff via email to [[w|hpc-help]].

Once a PVC is created for you, you will be able to list your PV (Persistent Volume) which will be able to be mounted over 1Gb Ethernet in any image on any k8s node:

[75 ewalter@cm ~ ]$kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ewalter-pv Bound pvc-78b5d9a9-9247-42bf-87db-3cec266b1fed 15Gi RWX ceph-filesystem 6d21h

The to use this PV, you can mount it within the pod section of your yml file:

volumes:
- name: ceph1
persistentVolumeClaim:
claimName: ewalter-pv # pre created pvc

Note, PV's are not accessible from the Slurm/batch clusters.

Getting images for K8s

You can choose the image you want from a cloud-based host like Docker Hub, or build your own image locally (via a utility like podman or docker). For now, we’ll just use “off the shelf” images from Docker Hub, which is the default location our Kubernetes cluster pulls from. One thing to remember is that all jobs/pods will run under your username without root permissions. Therefore, the pod must be usable with this constraint.