University of Cambridge Computer Laboratory - GN09 cooling failure – Incident details

GN09 cooling failure

Identified
Major outage
Started about 7 hours ago

Affected

Datacentres

Major outage from 10:05 AM to 12:00 AM

GN09

Major outage from 10:05 AM to 12:00 AM

Virtual Machine Hosting

Major outage from 10:05 AM to 12:00 AM

GPUs

Major outage from 10:05 AM to 12:00 AM

Secondary VM Hosts

Major outage from 10:05 AM to 12:00 AM

Data Storage

Major outage from 10:05 AM to 12:00 AM

Updates
  • Update
    Update

    We are continuing to work on a fix for this incident. A failed part on the chiller is about to be replaced.

  • Identified
    Identified

    Most servers in GN09 have been shut down to protect against further hardware damage. Technicians are on site and investigating a suspected chiller fault.

  • Investigating
    Investigating

    Cooling of GN09 failed overnight. Temperatures are already very high and some systems have failed / powered down. It is likely that more servers will have to be shut down before this is resolved.