University of Cambridge Computer Laboratory - Notice history

100% - uptime

Caelum Console (server management) - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Request Tracker - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Other Internal Services - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

External Services - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Network - Operational

100% - uptime
Apr 2023 · 99.98%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023
100% - uptime

GN09 - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

WCDC - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023
100% - uptime

Main VM Pool (WCDC) - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

GPUs - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Secondary VM Hosts - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Xen Orchestra - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023
100% - uptime

Filer - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Archive Server - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Data Replication - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023

Other Secondary Storage Systems - Operational

100% - uptime
Apr 2023 · 100.0%May · 100.0%Jun · 100.0%
Apr 2023
May 2023
Jun 2023
100% - uptime

Third Party: Fastmail → General Availability - Operational

Third Party: Fastmail → Mail delivery - Operational

Third Party: Fastmail → Web client and mobile app - Operational

Third Party: Fastmail → Mail access (IMAP/POP) - Operational

Third Party: Fastmail → Login & sessions - Operational

Third Party: Fastmail → Contacts (CardDAV) - Operational

Notice history

Jun 2023

Problems sending mail from Fastmail
  • Resolved
    Resolved

    Fastmail has repaired the fault. They report:

    This incident has been resolved. Network connectivity on a limited set of routes had been interrupted due to damage to a cross-site connection. No mail has been lost as a result of this incident, although external delivery of some mail was delayed. We will be taking steps to ensure future incidents like this will be resolved more quickly.

  • Monitoring
    Monitoring

    We've observed that Fastmail is reachable once again from the Computer Lab network; we are awaiting confirmation from Fastmail that this is a permanent fix.

  • Identified
    Identified

    This fault is also affecting:

    • Mail sent by Fastmail users, from an @cl.cam.ac.uk or @cst.cam.ac.uk address (or other configurations whereby Fastmail sends the email via msa.cl.cam.ac.uk), as Fastmail is unable to connect to msa.cl.cam.ac.uk;
    • Some other types of connection to Fastmail (e.g. web, IMAP) from some parts of the internet, including the University Data Network. As a workaround, if you're in the William Gates Building, you could temporarily use the 'wgb' wifi network to read your mail as that appears to be unaffected (but you still won't be able to send mail via msa.cl.cam.ac.uk);
    • Mail sent via msa.cl.cam.ac.uk from outside Fastmail, to a recipient whose email is hosted on Fastmail (we are implementing a workaround for this).

    We believe this is related to Fastmail's ongoing incident: https://fastmailstatus.com/clji5sqvg99220wtohhj3e2d2b

  • Investigating
    Investigating

    A networking problem between Fastmail and the Computer Lab is causing Fastmail to be unable to connect to our MSA. Users of Fastmail may find that they receive an error when trying to send mail, or that the mail is delayed. We have reported the problem to Fastmail as we believe the fault is Fastmail-specific.

Apr 2023

William Gates Building electrical shutdown
  • Completed
    April 11, 2023 at 2:12 PM
    Completed
    April 11, 2023 at 2:12 PM

    The second floor switches have been fixed (in one case, replaced with spare hardware). All is now believed to be back to normal. As usual, contact sys-admin if anything seems wrong.

  • Update
    April 11, 2023 at 12:43 PM
    In progress
    April 11, 2023 at 12:43 PM

    The WC2D switch serving part of the SC corridor is up for the moment, now that we have manually rolled back its failed firmware update, but will reboot again (~15 minute outage) later this afternoon to redo the update.

    Preparation of a replacement switch for WC2B (a few rooms on SN/SW) is ongoing.

  • Update
    April 11, 2023 at 11:13 AM
    In progress
    April 11, 2023 at 11:13 AM

    One of the WC2B switches serving about 50% of connections in the northwest corner of the second floor (SN, SW) appears to have completely failed; a replacement will be set up and installed but this will take a couple of hours.

    The WC2D switch problem (part of SC) is still under investigation.

  • Update
    April 11, 2023 at 10:44 AM
    In progress
    April 11, 2023 at 10:44 AM

    A switch serving part of the SN and SW corridor also has a fault under investigation.

    Servers in GN09 should now be back to normal.

  • In progress
    April 11, 2023 at 10:33 AM
    In progress
    April 11, 2023 at 10:33 AM

    A fault affecting the office network on parts of the SC corridor, arising after the power outage, is being investigated.

  • Completed
    April 11, 2023 at 10:30 AM
    Completed
    April 11, 2023 at 10:30 AM

    Maintenance has completed successfully

  • Update
    April 11, 2023 at 10:20 AM
    In progress
    April 11, 2023 at 10:20 AM

    The power maintenance has been completed; power was restored at around 10:49 (after a brief previous restoration).

    The office network (including wifi and phones) is now starting back up in sequence: ground floor initially, then first floor, then second floor. Each switch will go through an update process which may take 20 minutes or so. We expect that office networking is starting to come back up now but may take a little longer in some parts of the building.

    Servers in GN09 that were shut down can be restarted using the Caelum Console, https://console.caelum.cl.cam.ac.uk/ . Some servers known to be required are being started for you now.

  • Update
    April 11, 2023 at 7:22 AM
    In progress
    April 11, 2023 at 7:22 AM

    Generator transfer and power outage commencing shortly.

  • Update
    April 11, 2023 at 7:03 AM
    In progress
    April 11, 2023 at 7:03 AM

    Slight delay to the power outage as the engineer from UK Power Networks has been delayed. Please leave machines off, for now.

  • Update
    April 11, 2023 at 6:31 AM
    In progress
    April 11, 2023 at 6:31 AM

    Server shutdown in GN09 will begin shortly. Power outage expected in 30 minutes.

  • In progress
    April 11, 2023 at 6:30 AM
    In progress
    April 11, 2023 at 6:30 AM

    Maintenance is now in progress

  • Update
    April 11, 2023 at 6:30 AM
    Planned
    April 11, 2023 at 6:30 AM

    Power to the William Gates Building will be shut down for approximately 3 hours on the morning of Tuesday 11th April, for routine maintenance of the 11kV substation.

    Generators will be connected to maintain power to the GN09 UPS-protected circuits and core University/Janet network infrastructure only.

    • Servers in GN09 which are powered solely from mains circuits will be shut down prior to the work and will remain powered off until the maintenance has completed. A list of affected servers is being prepared and will be published as soon as possible.

    • Servers in GN09 powered by the UPS are at risk of disruption as well, because the process of transferring the UPS to a generator supply runs a small risk of tripping RCDs protecting individual rack circuits within GN09. (Last time we tested this, one circuit out of approximately 48 tripped.)

    • Core services such as filer and the GN09 core network are expected to remain up, since they are protected by two independent UPSes.

    • The office network, wifi and phones will be down for the duration of the maintenance; the UPS batteries in the wiring cupboards will not last long enough to sustain service. However we anticipate that the building will be closed during this work anyway as there will be no lighting or other basic services.

    • We will take the opportunity to upgrade firmware on the office network switches, so there is a small chance of further disruption once the power returns if there is any problem with the firmware update on a switch.

    • Regardless, once power returns it may take 30 minutes or more for the office network, wifi and phones to be restored to service. This is because the switches take a long time to start up, especially during a firmware update, plus we may have to manually turn wiring cupboard circuits back on throughout the building.

  • Update
    April 10, 2023 at 11:27 PM
    Planned
    April 10, 2023 at 11:27 PM

    Reminder: the William Gates Building's electrical supply will be shut down at 08:00 BST. Please shut down computers in offices before that time. Affected servers in the datacentre, GN09, will be shut down for you starting at 07:30. The list of affected servers in GN09 can be found at http://www.wiki.cl.cam.ac.uk/rowiki/SysInfo/20230411AffectedMachines .

  • Planned
    April 05, 2023 at 10:41 AM
    Planned
    April 05, 2023 at 10:41 AM

    The list of affected machines in GN09 that will be powered down prior to the electrical maintenance can be found at:

    http://www.wiki.cl.cam.ac.uk/rowiki/SysInfo/20230411AffectedMachines

Apr 2023 to Jun 2023

Next