University of Cambridge Computer Laboratory - Notice history

100% - uptime

Caelum Console (server management) - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Request Tracker - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Other Internal Services - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

External Services - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Network - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024
100% - uptime

GN09 - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

WCDC - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024
100% - uptime

Main VM Pool (WCDC) - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

GPUs - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Secondary VM Hosts - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Xen Orchestra - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024
100% - uptime

Filer - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Archive Server - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Data Replication - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Other Secondary Storage Systems - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024
100% - uptime

Third Party: Fastmail → General Availability - Operational

Third Party: Fastmail → Mail delivery - Operational

Third Party: Fastmail → Web client and mobile app - Operational

Third Party: Fastmail → Mail access (IMAP/POP) - Operational

Third Party: Fastmail → Login & sessions - Operational

Third Party: Fastmail → Contacts (CardDAV) - Operational

Notice history

Jul 2024

UIS firewall maintenance
  • Completed
    July 29, 2024 at 7:30 AM
    Completed
    July 29, 2024 at 7:30 AM
    Maintenance has completed successfully
  • In progress
    July 29, 2024 at 5:00 AM
    In progress
    July 29, 2024 at 5:00 AM
    Maintenance is now in progress
  • Planned
    July 29, 2024 at 5:00 AM
    Planned
    July 29, 2024 at 5:00 AM

    UIS will be carrying out network maintenance on Monday 29 July from 6am to 8:30am (to physically reconnect a data centre firewall to a new network).

    The central IT services listed below will be unavailable for 10–30 minutes during this period:

    • CamSIS

    • CHRIS

    • CUFS

    • Research dashboard

    • X5

    • University DNS Service

    We recommend waiting until after 08:30 before logging in to the services listed above. They may come back online earlier than 8:30am, so you can try to log in if you have urgent work, but please be aware that you may experience connectivity issues.

    If you experience issues accessing the services after the maintenance period, try logging out and back in. If problems persist, please contact the UIS Service Desk.

Partial wifi outage in GS corridor west
  • Resolved
    Resolved

    The replacement wireless access point for GS corridor west has now been installed. The signal may be variable over the next day or two whilst the system automatically calibrates the new hardware. After that, please report any wifi signal issues to service-desk@cl.cam.ac.uk.

    We have also separately been planning a major upgrade to the wireless network in the building, expected to take place in a few months' time. That should improve the wifi signal and wireless network performance throughout the building.

  • Identified
    Update

    We've taken delivery of a replacement wireless access point for the western end of the GS corridor; however some facilities work is needed to install it on the ceiling.

  • Identified
    Update

    A replacement wireless access point will be delivered and installed tomorrow.

  • Identified
    Identified

    The wireless access point in GS corridor west has apparently suffered a catastrophic hardware failure, and we are arranging a replacement.

  • Investigating
    Investigating

    We are aware that the wifi access point covering the western end of GS corridor stopped working last night, and are investigating.

Some mail into Fastmail is bouncing
  • Resolved
    Resolved

    We have not seen any reoccurrence of this issue since 2024-07-22 at 14:20, which coincides with when Fastmail fixed another problem, so we hypothesise that they fixed this one too by accident. However they are continuing to investigate.

  • Monitoring
    Update

    Fastmail has clarified that they have fixed only one of two issues that we reported, and it may be the case that mail continues to bounce with "bare <LF>" errors. We are in communication with Fastmail engineers to investigate the remaining problem, which appears to be a complex software bug. Nevertheless mail has not been bouncing for the past 21 hours so the problem may have been fixed by accident.

  • Monitoring
    Monitoring

    Fastmail has now confirmed that they have implemented a fix for the problem.

  • Investigating
    Update

    We have still not heard back from Fastmail, but mail seems to have stopped bouncing around 14:20 yesterday. We will continue to press Fastmail for an update and will continue to monitor the situation ourselves.

    If you are a Fastmail user and would like us to check our logs for any mail to you that bounced, please contact service-desk@cl.cam.ac.uk.

  • Investigating
    Investigating

    We are aware that Fastmail are bouncing some mail to users of fm.cl.cam.ac.uk with an error message such as "Error: bare <LF> received". We will ask Fastmail to investigate.

GPU cluster storage maintenance
  • Completed
    July 16, 2024 at 6:02 PM
    Completed
    July 16, 2024 at 6:02 PM

    This maintenance has been completed. Personal VMs can be started via Xen Orchestra (https://xo.cl.cam.ac.uk/) as needed.

  • Update
    July 16, 2024 at 5:36 PM
    In progress
    July 16, 2024 at 5:36 PM

    The outage has overrun due to a problem encountered during the storage server's RAM upgrade. Progress is being made; we can still upgrade the RAM and restore service, just not in the way we expected to. Current estimate for restoration of service: 19:00-19:15.

  • Update
    July 16, 2024 at 4:54 PM
    In progress
    July 16, 2024 at 4:54 PM

    This work is ongoing and is likely to overrun due to an unexpected hardware problem.

  • In progress
    July 16, 2024 at 4:00 PM
    In progress
    July 16, 2024 at 4:00 PM
    Maintenance is now in progress
  • Update
    July 16, 2024 at 4:00 PM
    Planned
    July 16, 2024 at 4:00 PM

    The server that hosts storage for the departmental GPU cluster needs an urgent security update, and a reboot.

    This will necessitate shutting down all GPU and CPU development VMs, dev-gpu-* and dev-cpu-1, including the shared servers dev-gpu-1 and dev-cpu-1. These VMs' disks, as well as associated data directories (GPU home directories and shared "gpuscratch" space), will be unavailable for about half an hour. As it will take time to shut down and restart the VM infrastructure, VMs will be unavailable for longer: approximately an hour.

    We will take the opportunity to add RAM to the storage server too, to improve performance.

  • Planned
    July 16, 2024 at 11:00 AM
    Planned
    July 16, 2024 at 11:00 AM

    Reminder: this maintenance is taking place at 17:00 today and will require all dev-gpu-* and dev-cpu-* VMs to be shut down.

Jun 2024

No notices reported this month

May 2024

WGB emergency network maintenance
  • Completed
    May 06, 2024 at 10:42 PM
    Completed
    May 06, 2024 at 10:42 PM
    Maintenance has completed successfully.
  • In progress
    May 06, 2024 at 10:15 PM
    In progress
    May 06, 2024 at 10:15 PM
    Maintenance is now in progress
  • Planned
    May 06, 2024 at 10:15 PM
    Planned
    May 06, 2024 at 10:15 PM

    We will be updating the software on the core router/switch in the William Gates Building (gatwick) in order to attempt to mitigate the ongoing crashes (https://cl.instatus.com/clvva1e4b43187b8n2hqywstc0). This upgrade cannot be performed "live", so there will be approximately 20-30 minutes' outage of the William Gates Building office network, and of filer. Other servers in GN09 should be largely unaffected.

WGB network problem under investigation
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    The core switch/router in the William Gates Building (gatwick) appears to have crashed and rebooted; perhaps a reoccurrence of issues a month ago (https://cl.instatus.com/clur1lte237417blopt08gvelk). Networking should have returned (initially via one switch of the redundant pair that constitutes gatwick, whilst the other switch restarts).

May 2024 to Jul 2024

Next