Notice history

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 202599.92% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 202599.98% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 202499.02% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 2024100.0% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Dec 202499.93% uptime

Jan 2025100.0% uptime

Feb 2025100.0% uptime

Feb 2025

Resolved
26 February, 2025 at 12:59
Resolved
26 February, 2025 at 12:59
We have updated a component (uwsgi) of the web front end, which we hope will help with the deadlock problem.
Monitoring
26 February, 2025 at 11:55
Monitoring
26 February, 2025 at 11:55
The server has been rebooted (and one bug fixed in the SSH key management backend) to restore service.
We are continuing to investigate a recurring problem that has caused the web frontends to stop responding a few times now.
Investigating
26 February, 2025 at 11:25
Investigating
26 February, 2025 at 11:25
The Caelum Console application for managing servers in datacentres, along with the VPN2 password application and SSH key management application, are currently unavailable. We are investigating.

Resolved
13 February, 2025 at 18:12
Resolved
13 February, 2025 at 18:12
This incident has been resolved.
Investigating
13 February, 2025 at 17:29
Investigating
13 February, 2025 at 17:29
We are aware of delays affecting some email to @cl.cam.ac.uk and @cst.cam.ac.uk addresses. We suspect a problem at Forward Email. We don't think that any email will be lost, only delayed until the problem is fixed.

Jan 2025

WCDC power distribution unit replacement

Completed
30 January, 2025 at 17:25
Completed
30 January, 2025 at 17:25
Maintenance has completed successfully.
In progress
30 January, 2025 at 14:30
In progress
30 January, 2025 at 14:30
Maintenance is now in progress
Planned
30 January, 2025 at 14:30
Planned
30 January, 2025 at 14:30
We will be replacing a power distribution unit (PDU) in our core infrastructure rack in the West Cambridge Data Centre, which powers the 1Gbps switches and a small number of other infrastructure systems. No user impact is expected, except for the following cases:
- User servers tfc-app1, tfc-app2, tfc-app4 will lose networking for approximately half an hour
- Verex access control management (card access updates etc.) will be unavailable for approximately half an hour
- Minor delays in authenticating to Active Directory are possible, as one of the three domain controllers (adsrv07) will be turned off for approximately 45 minutes
- BMC and serial console access to other systems in WCDC will be unavailable for approximately 30 minutes
One of the two DHCP servers (sxp12) will also be turned off, but the other server should seamlessly handle all DHCP requests.
This work is not related to the Estates electrical work happening in WCDC on the same day, but we have scheduled our work to take place during the same vulnerable period. Our PDU replacement will not reduce resilience any further.

WCDC electrical survey: core infrastructure at risk

Completed
30 January, 2025 at 16:00
Completed
30 January, 2025 at 16:00
Maintenance has completed successfully
In progress
30 January, 2025 at 09:00
In progress
30 January, 2025 at 09:00
Maintenance is now in progress
Planned
30 January, 2025 at 09:00
Planned
30 January, 2025 at 09:00
Estates will be performing a survey of the electrical infrastructure in the West Cambridge Data Centre, which will require each of the two power feeds to our racks being individually powered down for short periods of time. All our equipment in these racks is redundantly powered from two feeds, and we do not anticipate any disruption; however we should consider our core infrastructure to be at risk during this work as we will be operating with no power redundancy; disruption would be possible during this work in the event of any problem with the data centre power infrastructure or with individual power supplies on our servers and network equipment.

Resolved
22 January, 2025 at 18:03
Resolved
22 January, 2025 at 18:03
We believe that Mimecast has unblocked us.
There are some unrelated issues with some mailing lists still under investigation, not connected in any way (as far as we know) with the Mimecast problem; if you experience any more problems please contact service-desk@cst.cam.ac.uk.
Identified
20 January, 2025 at 12:04
Identified
20 January, 2025 at 12:04
We believe that UIS has successfully worked around this issue, and email sent to mailing lists from departmental addresses should now work.
However, we now also believe that this was a symptom of a broader problem with email to one particular email anti-spam service provider, Mimecast. Email to other institutions which also use Mimecast may also be affected. We are working on getting this resolved.
If you do encounter the issue, you may be able to get email through successfully by sending from an @cam.ac.uk address.
Update
17 January, 2025 at 16:05
Update
17 January, 2025 at 16:05
As a workaround, messages should get through if you send mail using Outlook from your @cam.ac.uk address to the relevant internal @lists.cam.ac.uk address for the mailing list; if you don't know what that address is for a particular list, contact service-desk@cst.cam.ac.uk.
Investigating
17 January, 2025 at 14:39
Investigating
17 January, 2025 at 14:39
We are aware that UIS's mailing list service is rejecting some email sent from Exchange Online to University mailing lists via cl.cam.ac.uk/cst.cam.ac.uk aliases. We have asked UIS to investigate.

Dec 2024

Migration to Forward Email, and new outbound email servers

Completed
31 December, 2024 at 15:30
Completed
31 December, 2024 at 15:30
Following a period of testing, email to the departmental domains cl.cam.ac.uk and cst.cam.ac.uk is now being routed by Forward Email. As previously announced, most people should not notice any change, but there will be subtle differences - particularly if you have custom mail filtering rules which rely on details of the legacy UIS or department email systems. Please contact service-desk@cst.cam.ac.uk if you notice any problems or need help to adapt your filtering rules. If you are not receiving email, contact us from an address hosted outside the department, such as your @cam.ac.uk address.
We have also replaced the mail servers used for routing outbound email from the department:
- msa.cl.cam.ac.uk
- mail.cl.cam.ac.uk / mail-serv.cl.cam.ac.uk etc.
Again, you should not need to make any changes; your existing credentials and settings for sending email should continue to work.
These were previously tightly integrated with our legacy inbound email processing, and are now simple standalone mail servers that only handle outbound email. They in turn send mail to the internet via UIS's new outbound email service smtp.cam.ac.uk.

Resolved
17 December, 2024 at 02:34
Resolved
17 December, 2024 at 02:34
This incident has been resolved.
Update
16 December, 2024 at 23:27
Update
16 December, 2024 at 23:27
Personal dev-gpu / dev-cpu VMs can now be started via Xen Orchestra.
Some VMs may need some maintenance in order to start:
- VMs that were running during the incident may have unclean filesystems that need a repair. Generally you will see the boot process end with "(initramfs)" on the console. Contact service-desk@cst.cam.ac.uk for help.
- VMs that have not been booted for a long time may need a manual update to /etc/fstab. If your VM appears to start but you have no home directory or your home directory is read-only, either run "sudo cl-update-system" then reboot, or contact service-desk@cst.cam.ac.uk for help.
The shared servers dev-gpu-1 and dev-cpu-1 will be unavailable for a little while longer.
Update
16 December, 2024 at 18:37
Update
16 December, 2024 at 18:37
Access to GPU cluster home directories and scratch space has been restored using the new storage server; these are accessible from Lab-managed Linux systems outside the GPU VM cluster via /anfs/gpucluster/$USER and /anfs/gpuscratch respectively. You can access this data via SSH to slogin.cl.cam.ac.uk.
dev-gpu-acs will be available shortly, for ACS students' use only.
GPU/CPU development VMs and the shared servers dev-gpu-1 and dev-cpu-1 remain unavailable; copying their VM disks will take longer. They should be restored to service later this evening.
Update
16 December, 2024 at 18:10
Update
16 December, 2024 at 18:10
Please do not attempt to start or stop any dev-gpu or dev-cpu VM at this time. It won't be successful, and might cause your VM to get into a more broken state.
Identified
16 December, 2024 at 17:24
Identified
16 December, 2024 at 17:24
As the GPU cluster is currently unusable anyway due to a fault with the temporary storage server, and we have a replacement storage server ready to go into service, we will take this opportunity to migrate data to the new server. This may take a few hours.
We believe that no data has been lost. The temporary storage is functioning, but the NFS service is not.
Update
16 December, 2024 at 17:15
Update
16 December, 2024 at 17:15
This issue is now also affecting clients which already have the filesystem mounted. They may see a permission error. Most dev-gpu/dev-cpu VMs have probably frozen as they can no longer access their disks.
Investigating
16 December, 2024 at 16:51
Investigating
16 December, 2024 at 16:51
We are investigating a problem whereby dev-gpu/dev-cpu home directories are failing to mount. The likely symptom is that GPU VMs will hang during boot, but VMs that are already running will keep working. Also, access to 'gpuscratch' paths may cause the client system to lock up.
This is due to a suspected Linux kernel bug on a storage server.
It is possible that some disruption will occur whilst we try to fix this.

Dec 2024 to Feb 2025

University of Cambridge Computer Laboratory - Notice history

All systems operational

Notice history

Feb 2025

Jan 2025

Dec 2024