Resolved
Some GPU cluster VMs crashed

Started
December 15, 2023 at 2:17 AM
Status
Resolved after about 1 hour

Impact

Partial outage
Affected
Virtual Machine Hosting
GPUs
  • Resolved
    Resolved

    This incident has been resolved.

  • Identified
    Identified

    Due to a problem during network maintenance, VMs on the departmental GPU cluster briefly lost access to their disks. This caused some VMs to crash. Affected VMs will be rebooted (if CPU VMs) or shut down (if GPU VMs); the latter can be started again from XO.