Skip to content →
Non
Euclidian
Space

Run Jupyter on an HPC cluster

Introduction

So you’ve got access to an HPC cluster with GPUs for your machine learning work - lucky you! But how do you actually use it for running Jupyter notebooks?

On a high level, a typical HPC workflow goes like this: You open your laptop and ssh into a login node (let’s call it ui-1). You then submit your compute task into the scheduler queue and it’s run on some worker node worker-1

To fit Jupyter into this workflow, launch JupyterLab as an interactive job. Then set up SSH port forwarding to open up Jupyter in your browser as if it ran on your local machine.

Here’s how your setup looks on a network level:

[your-machine] <-- SSH --> [ui-1] <-- SSH --> [worker-1 (JupyterLab)]

Assumptions

  • You already set up a virtual Python environment with Jupyter and machine learning packages;
  • The cluster uses Portable Batch System (PBS) for job scheduling.
  • If your cluster uses a different setup, the core concepts are likely the same — it’s just the command syntax that will differ.

Setting up browser-based access

Step 1: ssh into ui-1 and launch a JupyterLab server inside an interactive worker job:

# replace with your own username, server address, and SSH key path
ssh username@ui-1 -i path-to-private.key
# gets you one node with 4 CPUs and a single GPU.
qsub -I -l nodes=1:ppn=4:gpus=1

Tip: Note the assigned worker node name. You will need it to set up port forwarding from the worker to ui-1 in subsequent steps. For now, we assume it’s worker-1.

Once the interactive session starts, run:

module load conda
source activate my_env
jupyter lab --port=1337 --no-browser --ip=0.0.0.0

Tip: note the authentication token Jupyter just printed in the terminal. You’ll use it for browser-based access.

Step 2: Open another shell session and start this ssh tunnel:

ssh -L 1337:localhost:1337 your_user@ui-1

Keep this session open — it forwards your local port 1337 to ui-1:1337. But at this point, ui-1 doesn’t have anything on localhost:1337. So, within this session, ssh into worker-1 using a second tunnel:

ssh -L 1337:localhost:1337 your_user@worker-1

Step 3: Open localhost:1337 on your machine. Find the authentication token in the terminal window running JupyterLab.

Now:

  • Jupyter Lab runs on worker-1
  • ui-1:1337 forwards to worker-1:1337
  • your-machine:1337 forwards to ui-1:1337
  • The full tunnel path is:
your-machine → ui-1 → worker-1

Gotchas

Gotcha: Port is busy!

Since ui-1 and worker-1 are shared by many users, it is likely that port 1337 is unavailable. So pick a random port that is unlikely to be in use by other software or people. Ports 53 or 80 would be a bad idea :)

Gotcha: My batch job gets killed!

If you notice JupyterLab shutting down unexpectedly with the message received signal 15, stopping, it might be the job scheduler. In my case, cluster management system shot down interactive jobs that didn’t declare expected duration or queue. Submit your job with the queue or expected walltime specified. E.g. -q fast lets me run a notebook for 60 minutes.

Acknowledgements

A special thanks goes to Lauris Cikovskis for the insightful discussions that helped shape this post.