University of Cambridge Computer Laboratory - Notice history

100% - uptime

Caelum Console (server management) - Operational

100% - uptime
Feb 2025 · 99.92%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Request Tracker - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Other Internal Services - Operational

100% - uptime
Feb 2025 · 99.98%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

External Services - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Network - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 99.95%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025
100% - uptime

GN09 - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 99.21%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

WCDC - Operational

99% - uptime
Feb 2025 · 100.0%Mar · 98.49%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025
100% - uptime

Main VM Pool (WCDC) - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

GPUs - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Secondary VM Hosts - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Xen Orchestra - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025
100% - uptime

Filer - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Archive Server - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 99.59%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Data Replication - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025

Other Secondary Storage Systems - Operational

100% - uptime
Feb 2025 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2025
Mar 2025
Apr 2025
100% - uptime

Third Party: Fastmail → General Availability - Operational

Third Party: Fastmail → Mail delivery - Operational

Third Party: Fastmail → Web client and mobile app - Operational

Third Party: Fastmail → Mail access (IMAP/POP) - Operational

Third Party: Fastmail → Login & sessions - Operational

Third Party: Fastmail → Contacts (CardDAV) - Operational

Notice history

Apr 2025

No notices reported this month

Mar 2025

archive-smb outage: hardware fault
  • Resolved
    Resolved

    We have implemented a workaround and have brought archive-smb back into service, with reduced resilience pending replacement of a failed system SSD.

  • Investigating
    Investigating

    Since the West Cambridge Data Centre electrical fault, a component in jerakeen/archive-smb (the "new" archive server, currently hosting all SMB/CIFS volumes plus a couple of NFS volumes) has failed. We are investigating.

Major incident: West Cambridge Data Centre electrical outage
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    We observe that both power feeds in WCDC have now been restored. However as we have had no information from UIS about this incident, we do not yet know whether power can be considered stable.

    The archive-smb outage is ongoing and tracked in a separate incident. We believe that all other departmental systems are working again.

  • Identified
    Identified

    We observe that our equipment in the West Cambridge Data Centre lost power (both redundant feeds) around 10:50. Power has been partially restored (one feed) and most departmental systems are back online. However there are ongoing outages affecting multiple other University systems and the University Data Network.

    archive-smb is still down and this is being investigated.

    If any systems (in particular virtual machines) did not automatically start and are needed, please start them via https://xo.cl.cam.ac.uk/ or contact service-desk@cst.cam.ac.uk .

Chiller fault
  • Resolved
    Resolved

    This incident has been resolved. GN09 is fully operational. Most servers that were previously running have been restarted.

    If you have a physical server that is not running, you may be able to start it yourself via https://console.caelum.cl.cam.ac.uk as usual, or contact service-desk@cst.cam.ac.uk.

    VMs that were not set to start automatically have not been restarted. You can start VMs when you need them via https://xo.cl.cam.ac.uk as usual.

    Contact service-desk@cst.cam.ac.uk if there are any remaining issues.

  • Update
    Update

    Cooling has been restored and is expected to remain stable. The cause of the chiller shutting down was the chilled water circulation pumps stopping for some other reason, which will be investigated next week but which we expect to have been an isolated incident. The chiller still has one alarm present which is not preventing operation but is still being investigated.

    We are taking the opportunity of GN09 being shut down to perform some routine firmware and software updates on network hardware and storage systems, so we will not start turning servers back on quite yet, but expect to be able to do so shortly.

  • Update
    Update

    Progress has been made; the chiller is running again but there is a problem still under investigation. We are hopeful that servers can be turned back on again today, but will await the all-clear from the chiller technician.

  • Update
    Update

    Most servers in GN09 are now off, and must remain off until further notice. The emergency technician has arrived and is investigating.

  • Identified
    Identified

    The William Gates Building's chiller has a fault and has stopped running. Temperatures in our on-site data centre GN09 are rising rapidly. Engineers have been called out but it is likely that we will have to start shutting down servers in order to protect them.

Feb 2025 to Apr 2025

Next