University of Cambridge Computer Laboratory - Notice history

100% - uptime

Caelum Console (server management) - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Request Tracker - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Other Internal Services - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

External Services - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Network - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025
100% - uptime

GN09 - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

WCDC - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025
100% - uptime

Main VM Pool (WCDC) - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

GPUs - Operational

100% - uptime
Dec 2024 · 99.02%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Secondary VM Hosts - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Xen Orchestra - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025
100% - uptime

Filer - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Archive Server - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Data Replication - Operational

100% - uptime
Dec 2024 · 100.0%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025

Other Secondary Storage Systems - Operational

100% - uptime
Dec 2024 · 99.93%Jan 2025 · 100.0%Feb · 100.0%
Dec 2024
Jan 2025
Feb 2025
100% - uptime

Third Party: Fastmail → General Availability - Operational

Third Party: Fastmail → Mail delivery - Operational

Third Party: Fastmail → Web client and mobile app - Operational

Third Party: Fastmail → Mail access (IMAP/POP) - Operational

Third Party: Fastmail → Login & sessions - Operational

Third Party: Fastmail → Contacts (CardDAV) - Operational

Notice history

Feb 2025

No notices reported this month

Jan 2025

WCDC power distribution unit replacement
  • Completed
    January 30, 2025 at 5:25 PM
    Completed
    January 30, 2025 at 5:25 PM
    Maintenance has completed successfully.
  • In progress
    January 30, 2025 at 2:30 PM
    In progress
    January 30, 2025 at 2:30 PM
    Maintenance is now in progress
  • Planned
    January 30, 2025 at 2:30 PM
    Planned
    January 30, 2025 at 2:30 PM

    We will be replacing a power distribution unit (PDU) in our core infrastructure rack in the West Cambridge Data Centre, which powers the 1Gbps switches and a small number of other infrastructure systems. No user impact is expected, except for the following cases:

    • User servers tfc-app1, tfc-app2, tfc-app4 will lose networking for approximately half an hour

    • Verex access control management (card access updates etc.) will be unavailable for approximately half an hour

    • Minor delays in authenticating to Active Directory are possible, as one of the three domain controllers (adsrv07) will be turned off for approximately 45 minutes

    • BMC and serial console access to other systems in WCDC will be unavailable for approximately 30 minutes

    One of the two DHCP servers (sxp12) will also be turned off, but the other server should seamlessly handle all DHCP requests.

    This work is not related to the Estates electrical work happening in WCDC on the same day, but we have scheduled our work to take place during the same vulnerable period. Our PDU replacement will not reduce resilience any further.

Mailing lists rejecting email
  • Resolved
    Resolved

    We believe that Mimecast has unblocked us.

    There are some unrelated issues with some mailing lists still under investigation, not connected in any way (as far as we know) with the Mimecast problem; if you experience any more problems please contact service-desk@cst.cam.ac.uk.

  • Identified
    Identified

    We believe that UIS has successfully worked around this issue, and email sent to mailing lists from departmental addresses should now work.

    However, we now also believe that this was a symptom of a broader problem with email to one particular email anti-spam service provider, Mimecast. Email to other institutions which also use Mimecast may also be affected. We are working on getting this resolved.

    If you do encounter the issue, you may be able to get email through successfully by sending from an @cam.ac.uk address.

  • Update
    Update

    As a workaround, messages should get through if you send mail using Outlook from your @cam.ac.uk address to the relevant internal @lists.cam.ac.uk address for the mailing list; if you don't know what that address is for a particular list, contact service-desk@cst.cam.ac.uk.

  • Investigating
    Investigating

    We are aware that UIS's mailing list service is rejecting some email sent from Exchange Online to University mailing lists via cl.cam.ac.uk/cst.cam.ac.uk aliases. We have asked UIS to investigate.

Dec 2024

Migration to Forward Email, and new outbound email servers
  • Completed
    December 31, 2024 at 3:30 PM
    Completed
    December 31, 2024 at 3:30 PM

    Following a period of testing, email to the departmental domains cl.cam.ac.uk and cst.cam.ac.uk is now being routed by Forward Email. As previously announced, most people should not notice any change, but there will be subtle differences - particularly if you have custom mail filtering rules which rely on details of the legacy UIS or department email systems. Please contact service-desk@cst.cam.ac.uk if you notice any problems or need help to adapt your filtering rules. If you are not receiving email, contact us from an address hosted outside the department, such as your @cam.ac.uk address.

    We have also replaced the mail servers used for routing outbound email from the department:

    Again, you should not need to make any changes; your existing credentials and settings for sending email should continue to work.

    These were previously tightly integrated with our legacy inbound email processing, and are now simple standalone mail servers that only handle outbound email. They in turn send mail to the internet via UIS's new outbound email service smtp.cam.ac.uk.

GPU cluster storage fault
  • Resolved
    Resolved
    This incident has been resolved.
  • Update
    Update

    Personal dev-gpu / dev-cpu VMs can now be started via Xen Orchestra.

    Some VMs may need some maintenance in order to start:

    • VMs that were running during the incident may have unclean filesystems that need a repair. Generally you will see the boot process end with "(initramfs)" on the console. Contact service-desk@cst.cam.ac.uk for help.

    • VMs that have not been booted for a long time may need a manual update to /etc/fstab. If your VM appears to start but you have no home directory or your home directory is read-only, either run "sudo cl-update-system" then reboot, or contact service-desk@cst.cam.ac.uk for help.

    The shared servers dev-gpu-1 and dev-cpu-1 will be unavailable for a little while longer.

  • Update
    Update

    Access to GPU cluster home directories and scratch space has been restored using the new storage server; these are accessible from Lab-managed Linux systems outside the GPU VM cluster via /anfs/gpucluster/$USER and /anfs/gpuscratch respectively. You can access this data via SSH to slogin.cl.cam.ac.uk.

    dev-gpu-acs will be available shortly, for ACS students' use only.

    GPU/CPU development VMs and the shared servers dev-gpu-1 and dev-cpu-1 remain unavailable; copying their VM disks will take longer. They should be restored to service later this evening.

  • Update
    Update

    Please do not attempt to start or stop any dev-gpu or dev-cpu VM at this time. It won't be successful, and might cause your VM to get into a more broken state.

  • Identified
    Identified

    As the GPU cluster is currently unusable anyway due to a fault with the temporary storage server, and we have a replacement storage server ready to go into service, we will take this opportunity to migrate data to the new server. This may take a few hours.

    We believe that no data has been lost. The temporary storage is functioning, but the NFS service is not.

  • Update
    Update

    This issue is now also affecting clients which already have the filesystem mounted. They may see a permission error. Most dev-gpu/dev-cpu VMs have probably frozen as they can no longer access their disks.

  • Investigating
    Investigating

    We are investigating a problem whereby dev-gpu/dev-cpu home directories are failing to mount. The likely symptom is that GPU VMs will hang during boot, but VMs that are already running will keep working. Also, access to 'gpuscratch' paths may cause the client system to lock up.

    This is due to a suspected Linux kernel bug on a storage server.

    It is possible that some disruption will occur whilst we try to fix this.

Dec 2024 to Feb 2025

Next