Affected
Major outage from 3:07 PM to 9:00 PM
Major outage from 3:07 PM to 9:00 PM
- ResolvedResolved
This incident has been resolved. GN09 is fully operational. Most servers that were previously running have been restarted.
If you have a physical server that is not running, you may be able to start it yourself via https://console.caelum.cl.cam.ac.uk as usual, or contact service-desk@cst.cam.ac.uk.
VMs that were not set to start automatically have not been restarted. You can start VMs when you need them via https://xo.cl.cam.ac.uk as usual.
Contact service-desk@cst.cam.ac.uk if there are any remaining issues.
- UpdateUpdate
Cooling has been restored and is expected to remain stable. The cause of the chiller shutting down was the chilled water circulation pumps stopping for some other reason, which will be investigated next week but which we expect to have been an isolated incident. The chiller still has one alarm present which is not preventing operation but is still being investigated.
We are taking the opportunity of GN09 being shut down to perform some routine firmware and software updates on network hardware and storage systems, so we will not start turning servers back on quite yet, but expect to be able to do so shortly.
- UpdateUpdate
Progress has been made; the chiller is running again but there is a problem still under investigation. We are hopeful that servers can be turned back on again today, but will await the all-clear from the chiller technician.
- UpdateUpdate
Most servers in GN09 are now off, and must remain off until further notice. The emergency technician has arrived and is investigating.
- IdentifiedIdentified
The William Gates Building's chiller has a fault and has stopped running. Temperatures in our on-site data centre GN09 are rising rapidly. Engineers have been called out but it is likely that we will have to start shutting down servers in order to protect them.