A quick guide for High Performance Computing (HPC) and Slurm

Author

Erik Skare

Published

March 16, 2023

Using High Performance Computing (HPC)

High Performance Computing (HPC) is becoming increasingly important as we process, analyze, and perform complex calculations of increasing amounts of data. HPC uses clusters of powerful processors that work in parallel at extremely high speeds. Instead of spending days processing data with a regular computer, HPC systems typically perform at speeds far greater than the fastest commodity desktop, laptop, or server systems.

The University of Oslo has its own powerful HPC-cluster, which is available to all users of the Educloud Research infrastructure called FOX. You can apply for access HERE if you are affiliated with the university. Once you have created an account, set up Educloud with the University of Oslo’s VMWare Horizon Client.

Do not log into the Windows desktop through VMWare Horizon. Log into the Linux Fedora desktop (the one to the left) as the Command Prompt automatically works with the Fedora version of Educloud when you log into Fox (Fox will not find the files you try to process if you put them in your Educloud Windows desktop).

Logging into Fox

The following steps assumes you have created a user in Educloud, that your name is Ola Nordmann, and you have the username ec-olanor. For a quick guide on how you set up Educloud Research for the first time, click HERE.

In order to log into Fox, open your Command Prompt (cmd.exe) and type ssh followed by your Educloud username. For example, ssh ec-olanor@fox.educloud.no. You will then be prompted to fill in a One-Time Password (in an authentication app such as Authy) and your personal password. You are now logged into the Fox supercomputer and should see something like this:

[ec-olanor@login-4 ~]$

Transferring files to Educloud Research

There are 3 main ways to transfer files to your Educloud desktop:

  1. Once you have logged into your Educloud desktop through VMWare Horizon, simply drag the files from your local desktop to Educloud.

  2. I have had issues with dragging files over to Educloud desktop when working with large files or batches of files. Another approac is to use the filesender service at the University of Oslo (exclusive to UiO users).

  3. It may, at the outset, seem most comfortable to use the drag method or the UiO filesender. I have, however, faced several issues with this method (corrupted files, files that won’t download etc.). I have consequently started resorting to the scp command, which is one of the most convenient ways to transfer files to the Educloud desktop. Open the command prompt (cmd.exe) on your local computer (where you have the files you want to transfer) and type scp. You must then provide [full path to the files you want to transfer, including file name] [your full educloud account name]:[full path to where you want to save your files in Educloud]. The full scp command would then look something like this scp /C:/Users/olanor/file.txt ec-olanor@fox.educloud.no:/fp/homes01/u01/ec-olanor. This will prompt you to enter your One-Time Code and your personal password. Use -r if you want to transfer an entire directory like this: scp -r /C:/Users/olanor/directory_with_files ec-olanor@fox.educloud.no:/fp/homes01/u01/ec-olanor.

There are some issues when providing full file paths in UiO computers as there are a couple of spaces between “OneDrive” and “Universitetet i Oslo” (C:/Users/olanor/OneDrive - Universitetet i Oslo/Skrivebord). I fixed this issue the lazy way by simply moving the files to C:/Users/olanor.

Slurm

Let’s say I want to preprocess a 4GB .csv file containing 5,000 newspaper issues by using Natural Language Processing (NLP). This means lemmatizing the text; removing stopwords, numbers, punctuations, and symbols; normalize unicode characters etc. Usually, I would use a Python script (or R) to do so on my regular laptop, but this would take one week given the size of the .csv file (or 8147 minutes and 51,9 seconds to be precise). It would simply be unfeasible to do so if you have 6 or more of those .csv files.

This is where Slurm Workload Manager - formerly known as Simple Linux Utility for Resource Managenemt - comes in. In simplified terms, Slurm is an open-source software package for submitting, scheduling, and monitoring jobs on large computer clusters like Fox. Essentially, when we want Fox to execute our Python script (or R script), then we use Slurm to submit the job, schedule the job (as there are ususally more than one who is using Fox), and monitoring the job (as we wait for the job to be processed). While some deep learning engineers squirm at Slurm because it does not provide key features, it is highly useful for us mere mortals because it can be configured in a couple of minutes for easy (but time-consuming) tasks.

When using Slurm, we write what is called a slurm script (a .slrm file). In this script, we provide all required information when submitting a job to the HPC. For example, the number of CPUs or GPUs we need, how much memory our task requires, and where the HPC can find the Python code or R script that we want to run.

The Slurm script

To open a .slrm file, log into Fox in your command prompt (again, the command to log in is ssh ec-olanor@fox.educloud.no).

Then write the following command:

nano [PREFERRED NAME OF THE SCRIPT].slrm

(for example, nano my_slurm_script.slrm).

You run the same command if you want to edit the .slrm file later. Do not worry if you have forgotten the name of your .slrm file. You can find it in the $HOME directory. The first time you open your .slrm file, it will be empty so you need to fill all necessary information to run the job. Once you have edited the .slrm file and want to close it, just type CTRL+X followed by either SHIFT+Y (for “Yes, I want to save my changes”) or SHIFT+N (for “No, I do not want to save my changes”). You will then get the option to change the name of your .slrm file. If you like the name as it is, just press ENTER to exit.

Assuming you have filled in all necessary information in the .slrm file, the first part should look something like this:

#!/bin/bash

#SBATCH --account=ec145
#SBATCH --job-name="slurm_job"
#SBATCH --partition=normal
#SBATCH --time=1-12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=28
#SBATCH --mem=450G 
#SBATCH --output=%j.out
#SBATCH --output=%j.err
#SBATCH --mail-user=ec-olanor@fox.educloud.no
#SBATCH --mail-type=ALL
#SBATCH --requeue

There is a bit to unpack here:

  1. The first part of the code - #!/bin/bash - is what we in Linux parlance call a Shebang and we need it for the Slurm script to work as it should. In simplified terms, the aim of the shebang is to give the full path of the shell, so that it can be located wherever the script is run. You never have to change this bit, so do not worry too much about it.

  2. “SBATCH” means asynchronously submitting a batch file (asynchronous in the sense that it simply queues a script to be run on allocated resources at some point in the future; it returns with success once the job has been successfully queued). Through sbatch, the user submits a script for later execution. So the first and second lines, for example, #SBATCH --account=ec145 and ... job-name="slurm_job inform whose account is running the script and what you have chosen to call the job you have submitted (so you can recognize your job in the queue). You can call the job whatever you want (for example “slurm_job”). In that case, you’ll just submit the following batch: #SBATCH --job-name="slurm_job"). Remember, your account is not the same as your username in Educloud. The account is always ec[DIGITS]. If you are unsure of what your account is, log into Educloud HERE to find out.

  3. Some of the job types we run on Fox are implemented as partitions, which means that one specifies a partition to select the job type. Click HERE to read more about the job types at Fox. For now, it suffices to note that most job types submitted to Fox are normal jobs. So in 99 percent of the cases in the humanities/social sciences, it suffices with #SBATCH --partition=normal.

  4. #SBATCH --time=1-12:00:00 sets a limit on the total run time of the job allocation. When the time limit is reached, each task in each job step will be terminated without completing your job. “–time= …” essentially tells Fox: “Run my job for this long, and stop working on it if you need more than this”. In this case, #SBATCH --time=1-12:00:00 sets 1 day and 12 hours as the time limit ([days]-[hours]:[minutes]:[seconds]). If Fox has not finished the job by 36 hours, it will stop. Remember, do not ask for more time than you need. If you request, let’s say 900GB memory, 580 CPUs, 5 nodes, and a run time of 7 days, your job may be pending for a couple of weeks waiting for the necessary resources to become available.

  5. The next lines tells SLURM how many computing nodes, how many tasks, how many CPUs per task, and how much memory, you require. If you have a single task that can be run independently, you can specify a single task using #SBATCH --ntasks-per-node=1. If your workload can be parallelized across multiple tasks, then you can change the number to #SBATCH --ntasks-per-node=4. The more complicated your task is, the more nodes, CPUs, and memory you will need. Remember, the more you require from the HPC, the longer you will have to wait in the queue for other jobs to be processed. More is not always more effective. There is often some trial and error to find the optimal allocation of resources for your job. You can also provide #SBATCH --mem-per-cpu=4G (or 10G for that matter) instead of #SBATCH --mem450G. #SBATCH --mem-per-cpu= sets the amount of memory required for each CPU and is used to calculate the total memory required for your job based on the number of tasks specified by #SBATCH --ntasks and #SBATCH --ntasks-per-node. #SBATCH --mem= allows you to specify the exact amount of memory required for your job, regardless of the number of CPUs or tasks. This option gives you more control over the memory requirements of your job and can be useful in situations where you have a good idea of the memory requirements of your task and want to ensure that your job has enough memory to run.

  6. Both standard output and standard error are by default directed to a file where the “%j” in the name is replaced with the job allocation number (for example 017982). In this case, the #SBATCH --output=%j.out will save a standard output file in your $HOME directory called 017982.out. It may be easier to provide a little bit more information to the output title if you run more jobs, so you recognize the output files. For example #SBATCH --output=job1_%j.out and #SBATCH --output=job2_%j.out.

  7. If you provide a #SBATCH --mail-user= ... and #SBATCH --mail-type=ALL, it means that you will receive an email if your job is completed, if an error has occurred etc. The mailtypes you can receive are: INVALID_DEPEND, BEGIN, END, FAIL, REQUEUE, and STAGE_OUT.

See the section “Useful Slurm commands” below for, well, useful Slurm commands.

Setting up Python in Fox

When you log into Fox using ssh ec-olanor@fox.educloud.no, you will below “Welcome to FOX” see:

Current resource situation on the interactive machines, update [some date]:
        memory    (GiB)           load                /localscratch (GiB)
name    total     free    1-min   5-min    15-min     total  used    %used
int-1    1007      804     2.02   2.02      2.02      5959    882     15%
int-2    1007      674     5.00   5.02      6.08      5959   2240     38%
int-3    1007      719     6.40   9.07      5.54      5959    240      5%
int-4    1007      897     4.00   4.13      5.69      5959    668     12%

This overview shows the current resource situation on the interactive machines that you need to log into. Log into the least busy machine (the one with most (GiB) free) by running int [least busy interactive machine name]. In the case above, this would equal ssh int-4. You should then see [ec-olanor@login-2 ~]$ change into [ec-olanor@int-4 ~]$.

In order to run most Python scripts, you will need to install some packages (for example, pandas or regex - or, in my case, spacy) so that Fox can run all of the commands in your Python script. When working with Fox, the solution is to install packages in a virtual Python environment (Read more about Python environments HERE and about Python packages HERE).

There are two ways to set up Python and related environments in Educloud. One way (with which I struggled immensely) is the following:

# First load an appropriate Python module (use 'module list Python' to see all)
$ module load Python/3.8.6-GCCcore-10.2.0
# Create the virtual environment.
$ python -m venv my_new_pythonenv
# Activate the environment.
$ source my_new_pythonenv/bin/activate
# Install packages with pip. Here we install pandas.
$ python -m pip install pandas

The other way is the one set up by Thomas Hegghammer (thank you for saving me, Thomas) so the following is all due to his efforts. The only thing I have done is modifying the following Anaconda part so it works with Fox. For this walkthrough, we will try to use Fox to run this SpaCy script to lemmatize a Norwegian .csv file. Just remember to utilize Option 2 “with parallelization” when processing it through Fox:

Step 1: Loading Anaconda

In case there are preloaded Python/Anaconda modules loaded (which can complicate things), run the following command:

module purge

Then check what version of Anaconda is available:

module avail conda

Per February 8 2023, you will get two options: Anaconda3/5.3.0 or Miniconda3/4.9.2. Load the former:

module load Anaconda3/5.3.0

Step 2: Trying (and failing) to activate interactive shell

According to Educloud, you will need to run the following command to run your shell interactively:

conda init bash

Yet, once you do, you will get the error message CommandNotFoundError: No command 'conda init'.. This happens whether you load Anaconda on the login node or on an interactive one. Don’t worry, we will work our way around this problem by modifying the Slurm script later.

Step 3: Creating a Conda environment

We want to set up a Conda environment in the $HOME directory (the full file path is /fp/homes01/u01/ec-olanor with /ec-olanor equalling $HOME) so other nodes can access it later.

Going forward you will likely set up more environments for other tasks, so it’s worth creating a directory called envs and setting up norlem (or whatever you want to call your environment) there. We will work with Python3.9 so we have to specify the version when we create the environment:

conda create -p $HOME/envs/norlem python=3.9

You will then see:

## Package Plan ##

   environment location: /fp/homes01/u01/ec-olanor/envs/norlem
   
Proceed ([y]/[n])?

Run y if the environment location is correct, which will prompt:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#      $ conda activate /fp/homes01/u01/ec-olanor/envs/norlem
#
# To deactivate an active environment, use
#      $ conda deactivate

Remember, you can also activate the environment by running conda activate $HOME/envs/norlem. You will then see bash-4.4$ change to (norlem) bash-4.4$ which means you have succesfully activated your environment.

Step 4: Installing packages in the environment

Now you can install the packages:

conda install pandas regex pandarallel

Alternatively, if the conda install does not work, try running:

python -m pip install [PACKAGE NAME]

In this case:

python -m pip install spacy

Once pandas, regex, pandarallel, and SpaCy have been installed, download the Norwegian language module nb_core_news_lg:

python -m spacy download nb_core_news_lg

This works perfectly fine with English, Japanese, or Chinese .csv files, too. Just download the appropriate language module from SpaCy like zh_core_news_lg (for Chinese) instead.

To see what packages you have installed in your environment, run:

conda list

P.S. Try to avoid storing more data in Educloud than you need. I had transferred several GBs of data to Educloud that I had no use for, and got the error message ERROR: Could not install packages due to an OSError: [Errno 122] Disk quota exceeded. This was often fixed by running pip3 cache purge, but your life will be a lot easier if you only store the data you want to process and nothing else. So delete unnecessary files once they are processed.

Adding Python to your .slrm file

In order for Fox to run your Python script and interact with your newly established Python environment, we will need to add some more info to the .slrm script we created earlier (you are already familiar with the first #SBATCH lines).

We will first make Fox load the Anaconda module:

module load Anaconda3/5.3.0

Set the ${PS1} (needed in the source of the Anaconda environment)

export PS1=\$

Do you remember how we struggled to activate an interactive shell when creating our environment? The following line in the slurm script fixes this problem:

source /cluster/software/rhel8/easybuild/software/Anaconda3/5.3.0/etc/profile.d/conda.sh

Deactivate any spillover environment from the login node: conda deactivate &>/dev/null

Next, activate your environment by providing the full path:

conda activate /fp/homes01/u01/ec-olanor/envs/norlem

Then provide the full path to the Python script (a .py file in this case called aftenposten_unprocessed) you want Fox to run:

python /fp/homes01/u01/ec-olanor/scripts/aftenposten_unprocessed.py

P.S. This assumes that you have a specific folder for your Python scripts in the $HOME directory. To create a new directory in the command prompt, run:

mkdir $HOME/scripts

In the end, your .slrm file should look something like this:

#!/bin/bash

#SBATCH --account=ec145
#SBATCH --job-name="slurm_job"
#SBATCH --partition=normal
#SBATCH --time=1-12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=96
#SBATCH --mem=450G 
#SBATCH --output=%j.out
#SBATCH --output=%j.err
#SBATCH --mail-user=ec-olanor@fox.educloud.no
#SBATCH --mail-type=ALL
#SBATCH --requeue

module load Anaconda3/5.3.0
export PS1=\$
source /cluster/software/rhel8/easybuild/software/Anaconda3/5.3.0/etc/profile.d/conda.sh
conda deactivate &>/dev/null
conda activate /fp/homes01/u01/ec-olanor/envs/norlem
python /fp/homes01/u01/ec-olanor/scripts/aftenposten_unprocessed.py

Submitting your job and checking work status

You can now submit your job to Fox now that the .slrm file has all information it needs:

  1. To submit a job to Fox, run
sbatch [name of your .slrm file]

(For example, sbatch my_slurm_script.slrm)

If submitted correctly, you should receive the following message: Submitted batch job [JOB ID].

  1. Once your job is submitted, you can see where your job is placed in the queue of other jobs by running
squeue

or to only see the jobs you have submitted, run

squeue --user=ec-olanor

You will find information about the jobs that are running (or waiting to run):

  • JOBID: The ID of the different jobs.
  • PARTITION: The partition on which the jobs will run (this will always be the one you selected in your SLURM script).
  • NAME: The names of the jobs (this is why we provide it in the .slrm file: So that we can recognize our own job in the queue).
  • USER: The username of those who have submitted jobs to Fox.
  • STATUS (ST): The status of the jobs. Either R (for running), PD (for pending) or CG (for completing).
  • TIME: How long the different jobs have run.
  • NODES: The number of nodes the different jobs have requested.
  • NODELIST(REASON): This indicates on what node your job is running (for example gpu-2 or c1-9) or the reason why your job is pending. The reasons for a job pending are most often either “(Resources)” or “(Priority)”. “(Resources)” means that your job is pending because Fox is waiting for resources to become available while “(Priority)” means that your job is queued behind a higher priority job.
  1. For more detailed information about your job, run
scontrol show job [JOB ID]

(For example scontrol show job 170712)

This will give you far more detailed information about your job: When it was submitted (SubmitTime), estimated start time if pending (StartTime), and so on.

The same, but working with R

There are some slight differences when using Fox to process an R script because R does not work with modules/packages the same way that Python does (with environments etc.). Still, some things are the same.

Again, once you log into Fox using ssh ec-olanor@fox.educloud.no, you will see:

Current resource situation on the interactive machines, update [some date]:
        memory    (GiB)           load                /localscratch (GiB)
name    total     free    1-min   5-min    15-min     total  used    %used
int-1    1007      804     2.02   2.02      2.02      5959    882     15%
int-2    1007      674     5.00   5.02      6.08      5959   2240     38%
int-3    1007      719     6.40   9.07      5.54      5959    240      5%
int-4    1007      897     4.00   4.13      5.69      5959    668     12%

This information displays the available interactive nodes you need to log into in order to successfully load the R packages you download and install in your directory. Always choose the least busy interactive node (in the case above being “int-4”). Do so by running ssh int-4 (or, if it were int-3 that had most available memory: int-3). Once you are logged into an interactive node, you should see [ec-olanor@login-1 ~]$ change into ec-olanor@int-4 ~]$ indicating that you are no longer on the login node.

Now, once you are logged into the interactive node of your choice, see what R versions are available. Run:

module spider R

You should see something like this:

----------------------------------------------------------------------------
  R:
----------------------------------------------------------------------------
    Description:
      R is a free software environment for statistical computing and
      graphics.

     Versions:
        R/3.6.0-foss-2019a
        R/3.6.2-foss-2019b
        R/4.0.0-foss-2020a
        R/4.1.2-foss-2021b
     Other possible modules matches:
        AdapterRemoval  Arrow  BRAKER  Brotli  CUDAcore  DendroPy  Exonerate  ...

----------------------------------------------------------------------------

To load your preferred R module (in this case the last one), run:

module load R/4.1.2-foss-2021b

and then run R with the following command:

R

Step 1: Installing R packages:

Now, if you are familiar with R, then you will be aware that you need to install various packages to run your R script. Let’s start by creating a directory in which we will place your library and associated packages:

mkdir $HOME/R

And then set the location for where to install the packages:

.libPaths(c("/fp/homes01/u01/ec-olanor/R",.libPaths()))

In simplified terms, .libPaths() returns a character vector of the directories in which R searches for packages. The expression .libPaths(c("/fp/homes01/u01/ec-olanor/R", .libPaths())) adds a new directory, /fp/homes01/u01/ec-olanor/R, to the front of the current search path. This means that R will search /fp/homes01/u01/ec-olanor/R directory before it searches the other directories in the .libPaths() search path.

I tried to run the same command with /$HOME/R instead, but this caused some trouble finding the R directory. So always provide the full path when running the .libPaths() command.

We can then install our packages (for example quanteda):

install.packages("quanteda", repo="cran.uib.no")

If .libPaths(c("/fp/homes01/u01/ec-olanor/R",.libPaths())) worked correctly, you should see the following message in the command prompt followed by the installation of the package (this can take a couple of minutes):

Installing package into ‘/fp/homes01/u01/ec-olanor/R’
(as ‘lib’ is unspecified)

We can then check whether the packages were successfully installed:

library(quanteda)

which should give you the following (per February 9, 2023):

Package version: 3.2.4
Unicode version: 13.0
ICU version: 66.1
Parallel computing: 8 of 8 threads used.
See https://quanteda.io for tutorials and examples.

Step 2: Running an R script

Now that all required packages are installed in our $HOME/R directory, we can run our first R script in FOX. We will call this R script my_r_script.rscript. When doing so, we will have to provide full information so that FOX knows where to find our packages, the files we want to process, and where we want to store the output. It does not suffice to merely write library(quanteda) at the top of the R script. We need the beginning of our script to look like this:

#!/usr/bin/env Rscript
.libPaths(c("/fp/homes01/u01/ec-olanor/R",.libPaths()))
library(quanteda)

You may recognize the first line #!/usr/bin/env Rscript as it is a shebang (the same as in our Slurm script). In this case it informs Fox that it is running an R-script by adding Rscript at the end of the shebang. The second line .libPaths(c("/fp/homes01/u01/ec-olanor/R",.libPaths())) then tells Fox where quantedais located. You then load quanteda by running library(quanteda).

The same applies to the files we want to process with our R script. Let’s say we have a .csv file. This .csv file has four columns: 1) docID, 2) text, 3) publication_date, 4) newspaper. We want to create a corpus out of this .csv file with Quanteda:

.libPaths(c("/fp/homes01/u01/ec-olanor/R",.libPaths()))
library(quanteda)

my_df <- read.csv("$HOME/input/my_dataframe.csv")
my_corpus <- corpus(my_df$text, 
                    docnames = my_df$doc_id, 
                    docvars = data.frame(date = substr(my_df$publication_date, 0, 10),
                                         year = as.integer(substr(my_df$publication_date, 1, 4)),
                                         newspaper = my_df$newspaper))

Once the corpus is created, we will have to create an output file and provide the full path to where we want to save it:

save(my_corpus, file = "$HOME/output/my_corpus.Rda")

The entire R script should then look like this:

#!/usr/bin/env Rscript
.libPaths(c("/fp/homes01/u01/ec-olanor/R",.libPaths()))
library(quanteda)

my_df <- read.csv("$HOME/input/my_dataframe.csv")

my_corpus <- corpus(my_df$text, 
                   docnames = my_df$doc_id, 
                   docvars = data.frame(date = substr(my_df$publication_date, 0, 10),
                                        year = as.integer(substr(my_df$publication_date, 1, 4)),
                                        newspaper = my_df$newspaper))

save(my_corpus, file = "$HOME/output/my_corpus.Rda")

In order to make Fox run this script, we will create a Slurm script similar to the one we used to process our Python script (this is not a very complicated script so we will ask for less from Fox):

#!/bin/bash

#SBATCH --account=ec145
#SBATCH --job-name="r_script"
#SBATCH --partition=normal
#SBATCH --time=00:10:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=48G 
#SBATCH --output=%j.out
#SBATCH --output=%j.err
#SBATCH --mail-user=ec-olanor@fox.educloud.no
#SBATCH --mail-type=ALL
#SBATCH --requeue

module purge
module load R/4.1.2-foss-2021b

Rscript /fp/homes01/u01/ec-olanor/scripts/my_r_script.rscript > error.Rout

Briefly explained, we purge any other modules running before we load our preferred R module (R/4.1.2-foss-2021b). We then direct any possible error messages to the file “error.Rout”. Alternatively, if we do not provide the save(my_corpus, file = "my_corpus.Rda"), then we could feasibly add in the slurm script

Rscript my_r_script.rscript > output.Rout 2> error.Rout

By adding > output.Rout, the output of the R script is redirected to the file “output.Rout”.

Useful Slurm commands

  • ssh: Log into FOX Supercomputer in command prompt. For example ssh [Educloud username]@fox.educloud.no.

  • nano: Write/modify your slurm script in FOX. For example nano [preferred name of slurm script].slrm.

  • sbatch: Run your slurm script in FOX. For example sbatch [name of your slurm script].slrm.

  • scontrol show job: Check job status after submitting. For example scontrol show job [JOB ID].

  • squeue --user=: Check status of all of your submitted jobs. For example squeue --user=ec-olanor.

  • squeue: Check the queue of submitted jobs on FOX.

  • sacct: Check the history of all your submitted jobs.

  • scancel: Cancel your current job. For example scancel [JOB ID].

  • module spider: Check for available versions of R or Python on FOX. For example module spider R.

  • module load: Load the preferred version of the software you need. For example module load R/<version> or module load Python/<version>.

  • sinfo: Show the status of all Slurm nodes and partitions on the cluster, including their state (idle, down, or allocated) and any associated properties.

  • sbatch --dependency=afterok: Allows you to specify that a job should only start after another job has successfully completed. You can use the Job ID of the previous job as an argument to this option.

  • srun: Allows you to run a command on a compute node, instead of submitting it as a Slurm job. This can be useful for testing or debugging purposes. For example srun python_script.py

  • scancel -u: Allows you to cancel all jobs submitted by a specific user. For example: scancel --u ec-olanor

  • sview: Launches the Slurm graphical user interface, which provides a real-time view of the cluster’s status, including node utilization and job progress.

References