University of Cambridge Computer Laboratory - Notice history

100% - uptime

Caelum Console (server management) - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Request Tracker - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Other Internal Services - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

External Services - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Network - Operational

100% - uptime
Apr 2024 · 99.96%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024
100% - uptime

GN09 - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

WCDC - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024
100% - uptime

Main VM Pool (WCDC) - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

GPUs - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Secondary VM Hosts - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Xen Orchestra - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024
100% - uptime

Filer - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Archive Server - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Data Replication - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024

Other Secondary Storage Systems - Operational

100% - uptime
Apr 2024 · 100.0%May · 100.0%Jun · 100.0%
Apr 2024
May 2024
Jun 2024
100% - uptime

Third Party: Fastmail → General Availability - Operational

Third Party: Fastmail → Mail delivery - Operational

Third Party: Fastmail → Web client and mobile app - Operational

Third Party: Fastmail → Mail access (IMAP/POP) - Operational

Third Party: Fastmail → Login & sessions - Operational

Third Party: Fastmail → Contacts (CardDAV) - Operational

Notice history

Jun 2024

No notices reported this month

May 2024

WGB emergency network maintenance
  • Completed
    May 06, 2024 at 10:42 PM
    Completed
    May 06, 2024 at 10:42 PM
    Maintenance has completed successfully.
  • In progress
    May 06, 2024 at 10:15 PM
    In progress
    May 06, 2024 at 10:15 PM
    Maintenance is now in progress
  • Planned
    May 06, 2024 at 10:15 PM
    Planned
    May 06, 2024 at 10:15 PM

    We will be updating the software on the core router/switch in the William Gates Building (gatwick) in order to attempt to mitigate the ongoing crashes (https://cl.instatus.com/clvva1e4b43187b8n2hqywstc0). This upgrade cannot be performed "live", so there will be approximately 20-30 minutes' outage of the William Gates Building office network, and of filer. Other servers in GN09 should be largely unaffected.

WGB network problem under investigation
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    The core switch/router in the William Gates Building (gatwick) appears to have crashed and rebooted; perhaps a reoccurrence of issues a month ago (https://cl.instatus.com/clur1lte237417blopt08gvelk). Networking should have returned (initially via one switch of the redundant pair that constitutes gatwick, whilst the other switch restarts).

Apr 2024

gatwick (WGB core network) crashed
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Update

    gatwick crashed and rebooted again at around 06:22, again triggered by a routine configuration update.

    We had, earlier in the night, attempted to install a software update but due to an unrelated issue, the routers refused to do an 'In Service Software Upgrade' - i.e. the upgrade would have caused more disruption - so we chose to roll back and delay this update until Cisco published their notes about this particular version.

  • Monitoring
    Update

    The routers have remained stable since the crash, but we're going to do some further testing out-of-hours, and install a software update. There may be some further disruption whilst that happens.

  • Monitoring
    Update

  • Monitoring
    Monitoring

    The core router and switch in the William Gates Building (gatwick) seemingly crashed and rebooted at around 14:27.

    This appears to have been due to a software bug triggered by a routine configuration change. Although gatwick is a virtual switch/router comprising two independent physical systems, it seems that the entire virtual switch/router (both physical systems) rebooted simultaneously.

    This type of device takes a long time to reboot; in this case there would have been a little over 12 minutes during which the William Gates Building office network was cut off from the University network and the internet (followed by a further few minutes of instability). This would also have affected connectivity to filer and a few other core services hosted in the WGB.

    Investigation is ongoing into the reason for this outage.

Apr 2024 to Jun 2024

Next