University of Cambridge Computer Laboratory - GPU cluster storage maintenance – Maintenance details
All systems operational
GPU cluster storage maintenance
Completed
Scheduled for 16 July, 2024 at 16:00 – 18:02
Affects
Virtual Machine Hosting
Under maintenance from 4:00 PM to 6:02 PM
GPUs
Under maintenance from 4:00 PM to 6:02 PM
Updates
Completed
16 July, 2024 at 18:02
Completed
16 July, 2024 at 18:02
This maintenance has been completed. Personal VMs can be started via Xen Orchestra (https://xo.cl.cam.ac.uk/) as needed.
Update
16 July, 2024 at 17:36
In progress
16 July, 2024 at 17:36
The outage has overrun due to a problem encountered during the storage server's RAM upgrade. Progress is being made; we can still upgrade the RAM and restore service, just not in the way we expected to. Current estimate for restoration of service: 19:00-19:15.
Update
16 July, 2024 at 16:54
In progress
16 July, 2024 at 16:54
This work is ongoing and is likely to overrun due to an unexpected hardware problem.
In progress
16 July, 2024 at 16:00
In progress
16 July, 2024 at 16:00
Maintenance is now in progress
Update
16 July, 2024 at 16:00
Planned
16 July, 2024 at 16:00
The server that hosts storage for the departmental GPU cluster needs an urgent security update, and a reboot.
This will necessitate shutting down all GPU and CPU development VMs, dev-gpu-* and dev-cpu-1, including the shared servers dev-gpu-1 and dev-cpu-1. These VMs' disks, as well as associated data directories (GPU home directories and shared "gpuscratch" space), will be unavailable for about half an hour. As it will take time to shut down and restart the VM infrastructure, VMs will be unavailable for longer: approximately an hour.
We will take the opportunity to add RAM to the storage server too, to improve performance.
Planned
16 July, 2024 at 11:00
Planned
16 July, 2024 at 11:00
Reminder: this maintenance is taking place at 17:00 today and will require all dev-gpu-* and dev-cpu-* VMs to be shut down.