Run Jupyter on an HPC cluster
Introduction
So you’ve got access to an HPC cluster with GPUs for your machine learning work - lucky you! But how do you actually use it for running Jupyter
notebooks?
On a high level, a typical HPC workflow goes like this: You open your laptop and ssh
into a login node (let’s call it ui-1
). You then submit your compute task into the scheduler queue and it’s run on some worker node worker-1
To fit Jupyter
into this workflow, launch JupyterLab
as an interactive job. Then set up SSH port forwarding to open up Jupyter
in your browser as if it ran on your local machine.
Here’s how your setup looks on a network level:
[your-machine] <-- SSH --> [ui-1] <-- SSH --> [worker-1 (JupyterLab)]
Assumptions
- You already set up a virtual
Python
environment withJupyter
and machine learning packages; - The cluster uses
Portable Batch System (PBS)
for job scheduling. - If your cluster uses a different setup, the core concepts are likely the same — it’s just the command syntax that will differ.
Setting up browser-based access
Step 1: ssh
into ui-1
and launch a JupyterLab
server inside an interactive worker job:
# replace with your own username, server address, and SSH key path
ssh username@ui-1 -i path-to-private.key
# gets you one node with 4 CPUs and a single GPU.
qsub -I -l nodes=1:ppn=4:gpus=1
Tip: Note the assigned worker node name. You will need it to set up port forwarding from the worker to
ui-1
in subsequent steps. For now, we assume it’sworker-1
.
Once the interactive session starts, run:
module load conda
source activate my_env
jupyter lab --port=1337 --no-browser --ip=0.0.0.0
Tip: note the authentication token
Jupyter
just printed in the terminal. You’ll use it for browser-based access.
Step 2: Open another shell session and start this ssh
tunnel:
ssh -L 1337:localhost:1337 your_user@ui-1
Keep this session open — it forwards your local port 1337
to ui-1:1337
. But at this point, ui-1
doesn’t have anything on localhost:1337
. So, within this session, ssh
into worker-1 using a second tunnel:
ssh -L 1337:localhost:1337 your_user@worker-1
Step 3: Open localhost:1337
on your machine. Find the authentication token in the terminal window running JupyterLab
.
Now:
Jupyter Lab
runs onworker-1
ui-1:1337
forwards toworker-1:1337
your-machine:1337
forwards toui-1:1337
- The full tunnel path is:
your-machine → ui-1 → worker-1
Gotchas
Gotcha: Port is busy!
Since
ui-1
andworker-1
are shared by many users, it is likely that port1337
is unavailable. So pick a random port that is unlikely to be in use by other software or people. Ports53
or80
would be a bad idea :)
Gotcha: My batch job gets killed!
If you notice
JupyterLab
shutting down unexpectedly with the messagereceived signal 15, stopping
, it might be the job scheduler. In my case, cluster management system shot down interactive jobs that didn’t declare expected duration or queue. Submit your job with the queue or expected walltime specified. E.g.-q fast
lets me run a notebook for 60 minutes.
Acknowledgements
A special thanks goes to Lauris Cikovskis for the insightful discussions that helped shape this post.