Run Jupyter on an HPC cluster
Introduction
So you’ve got access to an HPC cluster with GPUs for your machine learning work - lucky you! But how do you actually use it for running Jupyter notebooks?
On a high level, a typical HPC workflow goes like this: You open your laptop and ssh into a login node (let’s call it ui-1). You then submit your compute task into the scheduler queue and it’s run on some worker node worker-1
To fit Jupyter into this workflow, launch JupyterLab as an interactive job. Then set up SSH port forwarding to open up Jupyter in your browser as if it ran on your local machine.
Here’s how your setup looks on a network level:
[your-machine] <-- SSH --> [ui-1] <-- SSH --> [worker-1 (JupyterLab)]
Assumptions
- You already set up a virtual
Pythonenvironment withJupyterand machine learning packages; - The cluster uses
Portable Batch System (PBS)for job scheduling. - If your cluster uses a different setup, the core concepts are likely the same — it’s just the command syntax that will differ.
Setting up browser-based access
Step 1: ssh into ui-1 and launch a JupyterLab server inside an interactive worker job:
# replace with your own username, server address, and SSH key path
ssh username@ui-1 -i path-to-private.key
# gets you one node with 4 CPUs and a single GPU.
qsub -I -l nodes=1:ppn=4:gpus=1
Tip: Note the assigned worker node name. You will need it to set up port forwarding from the worker to
ui-1in subsequent steps. For now, we assume it’sworker-1.
Once the interactive session starts, run:
module load conda
source activate my_env
jupyter lab --port=1337 --no-browser --ip=0.0.0.0
Tip: note the authentication token
Jupyterjust printed in the terminal. You’ll use it for browser-based access.
Step 2: Open another shell session and start this ssh tunnel:
ssh -L 1337:localhost:1337 your_user@ui-1
Keep this session open — it forwards your local port 1337 to ui-1:1337. But at this point, ui-1 doesn’t have anything on localhost:1337. So, within this session, ssh into worker-1 using a second tunnel:
ssh -L 1337:localhost:1337 your_user@worker-1
Step 3: Open localhost:1337 on your machine. Find the authentication token in the terminal window running JupyterLab.
Now:
Jupyter Labruns onworker-1ui-1:1337forwards toworker-1:1337your-machine:1337forwards toui-1:1337- The full tunnel path is:
your-machine → ui-1 → worker-1
Gotchas
Gotcha: Port is busy!
Since
ui-1andworker-1are shared by many users, it is likely that port1337is unavailable. So pick a random port that is unlikely to be in use by other software or people. Ports53or80would be a bad idea :)
Gotcha: My batch job gets killed!
If you notice
JupyterLabshutting down unexpectedly with the messagereceived signal 15, stopping, it might be the job scheduler. In my case, cluster management system shot down interactive jobs that didn’t declare expected duration or queue. Submit your job with the queue or expected walltime specified. E.g.-q fastlets me run a notebook for 60 minutes.
Acknowledgements
A special thanks goes to Lauris Cikovskis for the insightful discussions that helped shape this post.
Non