<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>University of Cambridge Computer Laboratory Status - Incident history</title>
    <link>https://cl.instatus.com</link>
    <description>University of Cambridge Computer Laboratory</description>
    <pubDate>Mon, 7 Dec 2026 00:00:00 +0000</pubDate>
    
<item>
  <title>West Cambridge Data Centre major outage 7-22 December</title>
  <description>
    Type: Maintenance
    

    Affected Components: Data Replication, Main VM Pool (WCDC), Other Internal Services, Other Secondary Storage Systems, External Services, Archive Server, WCDC
    Dec 7, 00:00:00 GMT+0 - Identified - UIS are scheduling some major work on the West Cambridge Data Centre during the period 7-22 December 2026, including a 10-day complete electrical outage affecting departmental systems there. This is likely to be highly disruptive to both departmental and University IT. We are considering the impact to our departmental operations and any possible mitigations to keep basic services operational and will update this page when we have more information.

UIS&#039;s announcement follows.

### What is happening? 

The University is making significant upgrades at the West Cambridge Data Centre to remediate and improve the electrical and cooling works on site. The proposed dates for this work are 7-22 December.

### Why is a full power isolation required?

This work is required as part of ongoing improvements to the heating and cooling systems on the site. This will both improve resilience for current services, and equip the University to host the Dawn supercomputer, putting us at the forefront of emerging AI technology.

To complete the work safely, there will need to be a 10-day full power shutdown on the site. There will be additional days in which services will be disrupted while we power down and power up the site. We expect the work be completed within the period 7-22 December 2026.

### Why is it happening at this time?

This will be a complex and challenging period of work. It is extremely difficult to find a suitable time for an extended power shutdown because the University operates year-round.

### How will this be managed?

A project team at UIS is coordinating this work. This includes project management, infrastructure and service specialists, technical leads, and communications. Initially, we are working to understand the needs of the services hosted at WCDC and what the impact of the shut down will be. We will seek to mitigate the risks and impact of the shut down as far as is reasonably possible and we will take action to maintain the security and integrity of the site. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    
    <p><strong>Affected Components:</strong> , , , , , , </p>
    &lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;00:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  UIS are scheduling some major work on the West Cambridge Data Centre during the period 7-22 December 2026, including a 10-day complete electrical outage affecting departmental systems there. This is likely to be highly disruptive to both departmental and University IT. We are considering the impact to our departmental operations and any possible mitigations to keep basic services operational and will update this page when we have more information.

UIS&#039;s announcement follows.

### What is happening? 

The University is making significant upgrades at the West Cambridge Data Centre to remediate and improve the electrical and cooling works on site. The proposed dates for this work are 7-22 December.

### Why is a full power isolation required?

This work is required as part of ongoing improvements to the heating and cooling systems on the site. This will both improve resilience for current services, and equip the University to host the Dawn supercomputer, putting us at the forefront of emerging AI technology.

To complete the work safely, there will need to be a 10-day full power shutdown on the site. There will be additional days in which services will be disrupted while we power down and power up the site. We expect the work be completed within the period 7-22 December 2026.

### Why is it happening at this time?

This will be a complex and challenging period of work. It is extremely difficult to find a suitable time for an extended power shutdown because the University operates year-round.

### How will this be managed?

A project team at UIS is coordinating this work. This includes project management, infrastructure and service specialists, technical leads, and communications. Initially, we are working to understand the needs of the services hosted at WCDC and what the impact of the shut down will be. We will seek to mitigate the risks and impact of the shut down as far as is reasonably possible and we will take action to maintain the security and integrity of the site..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 7 Dec 2026 00:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmoh705191c9fz7t84yqbq6ua</link>
  <guid>https://cl.instatus.com/maintenance/cmoh705191c9fz7t84yqbq6ua</guid>
</item>

<item>
  <title>Network maintenance: GN09 servers</title>
  <description>
    Type: Maintenance
    Duration: 6 days, 9 hours and 22 minutes

    Affected Components: Caelum Console (server management), Network
    May 16, 04:00:00 GMT+0 - Identified - We will upgrade the software on our 1Gbps network infrastructure in GN09, in order to rectify a known urgent issue and to maintain the security and reliability of the network. This is expected to cause around 15-30 minutes&#039; outage affecting all servers in GN09, except those with 10/40/100Gbps connections.

It **will not** affect virtual servers on departmental infrastructure, GPU servers, departmental storage servers (filer etc.), offices, wifi or phones. May 9, 18:37:58 GMT+0 - Completed - This scheduled maintenance is no longer required. The maintenance was completed during other unplanned GN09 outages. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 6 days, 9 hours and 22 minutes</p>
    <p><strong>Affected Components:</strong> , </p>
    &lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;04:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We will upgrade the software on our 1Gbps network infrastructure in GN09, in order to rectify a known urgent issue and to maintain the security and reliability of the network. This is expected to cause around 15-30 minutes&#039; outage affecting all servers in GN09, except those with 10/40/100Gbps connections.

It **will not** affect virtual servers on departmental infrastructure, GPU servers, departmental storage servers (filer etc.), offices, wifi or phones..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:37:58&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  This scheduled maintenance is no longer required. The maintenance was completed during other unplanned GN09 outages..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sat, 16 May 2026 04:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmo1hcu890006dozl5eg1gtnz</link>
  <guid>https://cl.instatus.com/maintenance/cmo1hcu890006dozl5eg1gtnz</guid>
</item>

<item>
  <title>GN09 cooling failure</title>
  <description>
    Type: Incident
    Duration: 8 hours and 31 minutes

    Affected Components: Secondary VM Hosts, GN09, GPUs, Filer
    May 9, 15:56:23 GMT+0 - Identified - We are continuing to work on a fix for this incident. A failed part on the chiller is about to be replaced. May 9, 18:00:36 GMT+0 - Monitoring - The chiller fault has been repaired. Services will now be brought back up (this could take a while). If you have servers in GN09, and access to Caelum Console, you are free to turn on your servers now. May 9, 18:36:33 GMT+0 - Resolved - This incident has been resolved. If your server did not boot back up or is not working, and you can&#039;t resolve this yourself please contact service-desk@cst.cam.ac.uk. May 9, 10:05:06 GMT+0 - Investigating - Cooling of GN09 failed overnight. Temperatures are already very high and some systems have failed / powered down. It is likely that more servers will have to be shut down before this is resolved. May 9, 13:00:23 GMT+0 - Identified - Most servers in GN09 have been shut down to protect against further hardware damage. Technicians are on site and investigating a suspected chiller fault. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 8 hours and 31 minutes</p>
    <p><strong>Affected Components:</strong> , , , </p>
    &lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:56:23&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We are continuing to work on a fix for this incident. A failed part on the chiller is about to be replaced..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:00:36&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  The chiller fault has been repaired. Services will now be brought back up (this could take a while). If you have servers in GN09, and access to Caelum Console, you are free to turn on your servers now..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:36:33&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved. If your server did not boot back up or is not working, and you can&#039;t resolve this yourself please contact service-desk@cst.cam.ac.uk..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:05:06&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  Cooling of GN09 failed overnight. Temperatures are already very high and some systems have failed / powered down. It is likely that more servers will have to be shut down before this is resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:00:23&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Most servers in GN09 have been shut down to protect against further hardware damage. Technicians are on site and investigating a suspected chiller fault..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sat, 9 May 2026 10:05:06 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmoy6gzxz03t2hmu3pkta8t1z</link>
  <guid>https://cl.instatus.com/incident/cmoy6gzxz03t2hmu3pkta8t1z</guid>
</item>

<item>
  <title>Power outage affecting William Gates Building</title>
  <description>
    Type: Incident
    Duration: 7 days, 19 hours and 24 minutes

    Affected Components: Caelum Console (server management), Secondary VM Hosts, Other Internal Services, GN09, GPUs, Filer
    May 2, 00:27:30 GMT+0 - Identified - Power appears to have been restored, and some systems have come back up automatically, however full restoration of services will take time and will only begin once we are confident that the power supply is stable and the UPS batteries have had some time to recharge.

We are aware of at least one apparent hardware failure arising from this outage so far. May 1, 23:12:52 GMT+0 - Investigating - We are aware of a possible site-wide power outage affecting the William Gates Building, including servers in GN09. May 1, 23:21:01 GMT+0 - Identified - Servers in GN09 not connected to a UPS supply, and machines in offices, have lost power. Servers in GN09 connected to a UPS supply are likely to lose power in a few minutes. UK Power Networks has confirmed the outage and expects power to be restored around 01:30-02:30\. Most departmental IT services will be unavailable until further notice. May 2, 01:51:09 GMT+0 - Monitoring - WGB and GN09 infrastructure is operational again. Most GN09 servers and services have been restored.

It is likely that some systems may not have started correctly, or may have entered a broken state if running elsewhere but relying on systems in GN09\. This is especially likely if they use filer or other networked storage that lost power. Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) to report any issues.

The UPS supplying power to part of GN09 has apparently suffered a fault whilst powering back up after the outage, and one third of its capacity is currently inoperative; resilience is reduced until repairs can be arranged. May 9, 18:36:51 GMT+0 - Resolved - This incident was previously resolved. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 7 days, 19 hours and 24 minutes</p>
    <p><strong>Affected Components:</strong> , , , , , </p>
    &lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 2&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;00:27:30&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Power appears to have been restored, and some systems have come back up automatically, however full restoration of services will take time and will only begin once we are confident that the power supply is stable and the UPS batteries have had some time to recharge.

We are aware of at least one apparent hardware failure arising from this outage so far..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 1&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;23:12:52&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are aware of a possible site-wide power outage affecting the William Gates Building, including servers in GN09..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 1&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;23:21:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Servers in GN09 not connected to a UPS supply, and machines in offices, have lost power. Servers in GN09 connected to a UPS supply are likely to lose power in a few minutes. UK Power Networks has confirmed the outage and expects power to be restored around 01:30-02:30\. Most departmental IT services will be unavailable until further notice..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 2&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;01:51:09&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  WGB and GN09 infrastructure is operational again. Most GN09 servers and services have been restored.

It is likely that some systems may not have started correctly, or may have entered a broken state if running elsewhere but relying on systems in GN09\. This is especially likely if they use filer or other networked storage that lost power. Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) to report any issues.

The UPS supplying power to part of GN09 has apparently suffered a fault whilst powering back up after the outage, and one third of its capacity is currently inoperative; resilience is reduced until repairs can be arranged..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:36:51&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident was previously resolved..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 1 May 2026 23:12:52 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmonj38x003ad4kdfaj4knwn3</link>
  <guid>https://cl.instatus.com/incident/cmonj38x003ad4kdfaj4knwn3</guid>
</item>

<item>
  <title>Network maintenance: West Cambridge Data Centre</title>
  <description>
    Type: Maintenance
    Duration: 30 minutes

    Affected Components: Network
    Apr 27, 04:00:00 GMT+0 - Identified - We will upgrade the software on our 1Gbps network infrastructure in the West Cambridge Data Centre, in order to rectify a known urgent issue and to maintain the security and reliability of the network. This is expected to cause around 15-30 minutes&#039; outage affecting physical servers in the department&#039;s racks in the West Cambridge Data Centre (except those with 10Gbps connections).

It **will not** affect virtual servers on departmental infrastructure, or storage servers (filer etc.).

It **will** affect the Morello cluster. Apr 27, 04:00:01 GMT+0 - Identified - Maintenance is now in progress Apr 27, 04:30:00 GMT+0 - Completed - Maintenance has completed successfully 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 30 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;04:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We will upgrade the software on our 1Gbps network infrastructure in the West Cambridge Data Centre, in order to rectify a known urgent issue and to maintain the security and reliability of the network. This is expected to cause around 15-30 minutes&#039; outage affecting physical servers in the department&#039;s racks in the West Cambridge Data Centre (except those with 10Gbps connections).

It **will not** affect virtual servers on departmental infrastructure, or storage servers (filer etc.).

It **will** affect the Morello cluster..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;04:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;04:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully.&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 27 Apr 2026 04:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmo1h1zwa0883b7bi48pg5808</link>
  <guid>https://cl.instatus.com/maintenance/cmo1h1zwa0883b7bi48pg5808</guid>
</item>

<item>
  <title>Network maintenance: offices</title>
  <description>
    Type: Maintenance
    Duration: 30 minutes

    Affected Components: Network
    Apr 26, 04:00:00 GMT+0 - Identified - We will upgrade the software on the office network infrastructure, in order to rectify a known urgent issue and to maintain the security and reliability of the network. This is expected to cause around 15-30 minutes&#039; outage affecting wired ethernet connections in offices, labs and public areas of the William Gates Building, as well as wifi and phones.

This outage has been scheduled to take place automatically early on Sunday morning in an attempt to minimise disruption.

Similar outages affecting servers in GN09 and the West Cambridge Data Centre are being scheduled separately. Apr 26, 04:00:01 GMT+0 - Identified - Maintenance is now in progress Apr 26, 04:30:00 GMT+0 - Completed - Maintenance has completed successfully 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 30 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;04:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We will upgrade the software on the office network infrastructure, in order to rectify a known urgent issue and to maintain the security and reliability of the network. This is expected to cause around 15-30 minutes&#039; outage affecting wired ethernet connections in offices, labs and public areas of the William Gates Building, as well as wifi and phones.

This outage has been scheduled to take place automatically early on Sunday morning in an attempt to minimise disruption.

Similar outages affecting servers in GN09 and the West Cambridge Data Centre are being scheduled separately..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;04:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;04:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully.&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sun, 26 Apr 2026 04:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmo1gutpj0bc2labisubcz4yw</link>
  <guid>https://cl.instatus.com/maintenance/cmo1gutpj0bc2labisubcz4yw</guid>
</item>

<item>
  <title>Archive server (jerakeen) rebooting</title>
  <description>
    Type: Maintenance
    Duration: 1 day, 11 hours and 15 minutes

    Affected Components: Archive Server
    Apr 16, 21:30:00 GMT+0 - Identified - Due to an issue that requires urgent attention, the archive server jerakeen will be rebooted shortly. An outage of 30-60 minutes is expected. This will affect access to certain paths in /auto/archive, /auto/gfxdisp and \\\\archive-smb.cl.cam.ac.uk. Apr 16, 21:30:01 GMT+0 - Identified - Maintenance is now in progress Apr 16, 22:05:14 GMT+0 - Completed - Maintenance has completed successfully. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 day, 11 hours and 15 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;21:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Due to an issue that requires urgent attention, the archive server jerakeen will be rebooted shortly. An outage of 30-60 minutes is expected. This will affect access to certain paths in /auto/archive, /auto/gfxdisp and \\\\archive-smb.cl.cam.ac.uk..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;21:30:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;22:05:14&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 16 Apr 2026 21:30:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmo1zlfan01doeml3ld9pza09</link>
  <guid>https://cl.instatus.com/maintenance/cmo1zlfan01doeml3ld9pza09</guid>
</item>

<item>
  <title>Office network outage (WC1C)</title>
  <description>
    Type: Incident
    Duration: 12 minutes

    Affected Components: Network
    Mar 12, 13:14:00 GMT+0 - Identified - Due to an electrical disturbance, the network infrastructure serving network port IDs starting WC1C is rebooting and will be offline for a few more minutes. Wired and wireless networking in affected offices (parts of FE and FC corridors) will be unavailable for a short while. Mar 12, 13:25:54 GMT+0 - Resolved - This incident has been resolved. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 12 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 12&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:14:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Due to an electrical disturbance, the network infrastructure serving network port IDs starting WC1C is rebooting and will be offline for a few more minutes. Wired and wireless networking in affected offices (parts of FE and FC corridors) will be unavailable for a short while..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 12&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:25:54&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 12 Mar 2026 13:14:00 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmmni1zbn024e56sbkrq82ef5</link>
  <guid>https://cl.instatus.com/incident/cmmni1zbn024e56sbkrq82ef5</guid>
</item>

<item>
  <title>GN09 cooling fault</title>
  <description>
    Type: Incident
    Duration: 4 hours and 13 minutes

    Affected Components: Secondary VM Hosts, GN09, GPUs, , 
Internal Services →
    Mar 9, 13:00:35 GMT+0 - Monitoring - Provisionally, the cooling fault appears to have been rectified. We will allow the facility to reach its normal temperature again and will monitor stability before restarting the small number of servers that we shut down. Mar 9, 14:41:53 GMT+0 - Resolved - This incident has been resolved. Mar 9, 10:28:42 GMT+0 - Identified - Following some building maintenance this morning, cooling for our data centre GN09 is currently inoperative. An engineer has been urgently requested to attend.

It is likely that unless the problem can be solved quickly, we will have to shut down servers in GN09. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 4 hours and 13 minutes</p>
    <p><strong>Affected Components:</strong> , , , </p>
    &lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:00:35&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  Provisionally, the cooling fault appears to have been rectified. We will allow the facility to reach its normal temperature again and will monitor stability before restarting the small number of servers that we shut down..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:41:53&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:28:42&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Following some building maintenance this morning, cooling for our data centre GN09 is currently inoperative. An engineer has been urgently requested to attend.

It is likely that unless the problem can be solved quickly, we will have to shut down servers in GN09..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 9 Mar 2026 10:28:42 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmmj1gehm003jvhgbuzghifzj</link>
  <guid>https://cl.instatus.com/incident/cmmj1gehm003jvhgbuzghifzj</guid>
</item>

<item>
  <title>Delays to mail forwarded through cl.cam.ac.uk / cst.cam.ac.uk</title>
  <description>
    Type: Incident
    Duration: 21 hours and 27 minutes

    Affected Components: External Services
    Feb 19, 13:01:36 GMT+0 - Identified - We are aware that some mail forwarded through our departmental domains [cst.cam.ac.uk](http://cst.cam.ac.uk) and [cl.cam.ac.uk](http://cl.cam.ac.uk) is not getting through at the moment. This is particularly affecting mail forwarded to @cam.ac.uk addresses, and is happening due to a problem at our provider. We have contacted the provider and UIS in order to try to resolve this as quickly as possible.

We believe that most affected mail will get through eventually, once the problem is rectified. There is a chance that some mail will be bounced to its original sender.

Mail sent directly to @cam.ac.uk addresses is not affected. Feb 19, 18:35:45 GMT+0 - Monitoring - Our email routing provider (Forward Email) mitigated the issue at around 17:00\. We will continue to monitor, but so far the mitigation appears to be successful.

This issue was due to a third party (Spamhaus) erroneously listing a Forward Email server as a source of spam. Numerous other email systems around the world (including both UIS and Microsoft) trust Spamhaus to identify malicious or compromised systems, and so had started to reject email from that specific Forward Email server, incorrectly believing it to be behaving maliciously. Forward Email have taken the affected server out of rotation until Spamhaus rectifies the issue. Feb 20, 10:28:54 GMT+0 - Resolved - This problem has been fully mitigated. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 21 hours and 27 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:01:36&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We are aware that some mail forwarded through our departmental domains [cst.cam.ac.uk](http://cst.cam.ac.uk) and [cl.cam.ac.uk](http://cl.cam.ac.uk) is not getting through at the moment. This is particularly affecting mail forwarded to @cam.ac.uk addresses, and is happening due to a problem at our provider. We have contacted the provider and UIS in order to try to resolve this as quickly as possible.

We believe that most affected mail will get through eventually, once the problem is rectified. There is a chance that some mail will be bounced to its original sender.

Mail sent directly to @cam.ac.uk addresses is not affected..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:35:45&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  Our email routing provider (Forward Email) mitigated the issue at around 17:00\. We will continue to monitor, but so far the mitigation appears to be successful.

This issue was due to a third party (Spamhaus) erroneously listing a Forward Email server as a source of spam. Numerous other email systems around the world (including both UIS and Microsoft) trust Spamhaus to identify malicious or compromised systems, and so had started to reject email from that specific Forward Email server, incorrectly believing it to be behaving maliciously. Forward Email have taken the affected server out of rotation until Spamhaus rectifies the issue..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:28:54&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This problem has been fully mitigated..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 19 Feb 2026 13:01:36 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmltgzpsn0bnildpdzbvyam98</link>
  <guid>https://cl.instatus.com/incident/cmltgzpsn0bnildpdzbvyam98</guid>
</item>

<item>
  <title>Departmental GPU cluster storage maintenance</title>
  <description>
    Type: Maintenance
    Duration: 1 hour and 8 minutes

    Affected Components: GPUs, Other Secondary Storage Systems
    Dec 18, 16:30:00 GMT+0 - Identified - We have been advised by the manufacturer that the storage server that holds home directories and VM disks on the departmental GPU cluster needs an urgent firmware update in order to avoid a possible data-loss issue.

Due to the risk of data loss and the upcoming Christmas closure and staff leave, we think it&#039;s best to do this update at short notice on Thursday 18th December.

This will affect personal GPU/CPU development VMs named dev-gpu-\* and dev-cpu-\*, as well as the shared development servers dev-cpu-1, dev-gpu-2, dev-gpu-acs and dev-cpu-acs.

The shared servers will be shut down before the maintenance, and started again once it is complete.

Affected VMs will be shut down before the maintenance, and will not be started again automatically. You will be able to start them again via Xen Orchestra once the maintenance has completed - please check this page and await confirmation that it&#039;s OK to try to start your VM. Dec 18, 16:30:01 GMT+0 - Identified - Maintenance is now in progress Dec 18, 17:38:19 GMT+0 - Completed - Maintenance has completed successfully.

The shared development servers are available again.

VMs can be started again via Xen Orchestra. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 hour and 8 minutes</p>
    <p><strong>Affected Components:</strong> , </p>
    &lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We have been advised by the manufacturer that the storage server that holds home directories and VM disks on the departmental GPU cluster needs an urgent firmware update in order to avoid a possible data-loss issue.

Due to the risk of data loss and the upcoming Christmas closure and staff leave, we think it&#039;s best to do this update at short notice on Thursday 18th December.

This will affect personal GPU/CPU development VMs named dev-gpu-\* and dev-cpu-\*, as well as the shared development servers dev-cpu-1, dev-gpu-2, dev-gpu-acs and dev-cpu-acs.

The shared servers will be shut down before the maintenance, and started again once it is complete.

Affected VMs will be shut down before the maintenance, and will not be started again automatically. You will be able to start them again via Xen Orchestra once the maintenance has completed - please check this page and await confirmation that it&#039;s OK to try to start your VM..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:30:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:38:19&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully.

The shared development servers are available again.

VMs can be started again via Xen Orchestra..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 18 Dec 2025 16:30:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmja3nu14026azp07nlmmz8z9</link>
  <guid>https://cl.instatus.com/maintenance/cmja3nu14026azp07nlmmz8z9</guid>
</item>

<item>
  <title>University-wide network instability</title>
  <description>
    Type: Incident
    Duration: 1 hour and 29 minutes

    Affected Components: Network
    Oct 31, 16:36:16 GMT+0 - Monitoring - The disruption seems to have ended, though we have not had any official confirmation that the fault is resolved. Oct 31, 17:17:57 GMT+0 - Resolved - The incident appears to have ended. Oct 31, 15:49:14 GMT+0 - Investigating - We are observing some intermittent issues with the University&#039;s internet/Janet connection, which may impact performance of certain services and websites. This issue is outside of the department and is affecting the whole University. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 1 hour and 29 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 31&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:36:16&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  The disruption seems to have ended, though we have not had any official confirmation that the fault is resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 31&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:17:57&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  The incident appears to have ended..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 31&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:49:14&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are observing some intermittent issues with the University&#039;s internet/Janet connection, which may impact performance of certain services and websites. This issue is outside of the department and is affecting the whole University..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 31 Oct 2025 15:49:14 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmhf13pkf02ht10sxb6i24xhd</link>
  <guid>https://cl.instatus.com/incident/cmhf13pkf02ht10sxb6i24xhd</guid>
</item>

<item>
  <title>Urgent VPN2 maintenance</title>
  <description>
    Type: Maintenance
    Duration: 23 hours and 47 minutes

    Affected Components: Network
    Oct 27, 20:12:45 GMT+0 - Completed - Maintenance has completed successfully. Oct 28, 20:00:00 GMT+0 - Identified - The servers hosting VPN2 will be restarted this evening to mitigate an urgent security issue. All connections to VPN2 will be dropped, and may for a few minutes be unable to reconnect. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 23 hours and 47 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;20:12:45&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 28&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;20:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The servers hosting VPN2 will be restarted this evening to mitigate an urgent security issue. All connections to VPN2 will be dropped, and may for a few minutes be unable to reconnect..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 28 Oct 2025 20:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmh9h8s1g01q8mw6gm6zl2x7i</link>
  <guid>https://cl.instatus.com/maintenance/cmh9h8s1g01q8mw6gm6zl2x7i</guid>
</item>

<item>
  <title>West Cambridge electrical instability</title>
  <description>
    Type: Incident
    Duration: 10 hours and 14 minutes

    
    Oct 3, 19:44:20 GMT+0 - Resolved - The electrical supply has been stable since 11:05.

However the [UK Power Networks incident](https://www.ukpowernetworks.co.uk/power-cut/incident?incidentid=INCD-553828-Z&amp;originSector=CB3%200) is still open and indicates that the electrical supply to northwest Cambridge is currently rerouted around a faulty HV cable; this means there may be future brief interruptions whilst the fault is repaired. We don&#039;t know when this will happen or how disruptive it will be. Oct 3, 09:30:06 GMT+0 - Monitoring - We are aware of an instability of the electrical supply (&quot;brown out&quot;) in the William Gates Building around 10:12\. This caused some network infrastructure and some computers (including some servers in GN09) to reboot but we believe that everything affected should now be working again. Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you are aware of any remaining problems. Oct 3, 10:32:23 GMT+0 - Monitoring - A further electrical supply problem occurred at 11:05; PCs in offices and non-UPS-protected servers in GN09 may have again powered off or rebooted. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 10 hours and 14 minutes</p>
    
    &lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 3&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;19:44:20&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  The electrical supply has been stable since 11:05.

However the [UK Power Networks incident](https://www.ukpowernetworks.co.uk/power-cut/incident?incidentid=INCD-553828-Z&amp;originSector=CB3%200) is still open and indicates that the electrical supply to northwest Cambridge is currently rerouted around a faulty HV cable; this means there may be future brief interruptions whilst the fault is repaired. We don&#039;t know when this will happen or how disruptive it will be..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 3&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;09:30:06&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  We are aware of an instability of the electrical supply (&quot;brown out&quot;) in the William Gates Building around 10:12\. This caused some network infrastructure and some computers (including some servers in GN09) to reboot but we believe that everything affected should now be working again. Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you are aware of any remaining problems..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 3&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:32:23&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  A further electrical supply problem occurred at 11:05; PCs in offices and non-UPS-protected servers in GN09 may have again powered off or rebooted..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 3 Oct 2025 09:30:06 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmgan8akq03miy78czyvzp1gp</link>
  <guid>https://cl.instatus.com/incident/cmgan8akq03miy78czyvzp1gp</guid>
</item>

<item>
  <title>GN09 cooling fault</title>
  <description>
    Type: Incident
    Duration: 3 hours and 26 minutes

    Affected Components: GPUs, GN09, Secondary VM Hosts
    Oct 1, 10:55:21 GMT+0 - Identified - Cooling for GN09 is currently inoperable. Engineers are on the way, but due to climbing temperatures, we will have to shut down all research and teaching servers shortly. Oct 1, 13:44:40 GMT+0 - Identified - The chiller is operational again. We will start to bring affected services back online. This will take time and we will provide another update when this is complete. Oct 1, 14:07:32 GMT+0 - Identified - Caelum users are free to turn their servers on again via the Caelum Console.

VM users are free to turn their VMs on again via Xen Orchestra. Oct 1, 14:21:03 GMT+0 - Resolved - This incident has been resolved.

Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you are experiencing any ongoing issues. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 3 hours and 26 minutes</p>
    <p><strong>Affected Components:</strong> , , </p>
    &lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 1&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:55:21&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Cooling for GN09 is currently inoperable. Engineers are on the way, but due to climbing temperatures, we will have to shut down all research and teaching servers shortly..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 1&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:44:40&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The chiller is operational again. We will start to bring affected services back online. This will take time and we will provide another update when this is complete..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 1&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:07:32&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Caelum users are free to turn their servers on again via the Caelum Console.

VM users are free to turn their VMs on again via Xen Orchestra..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 1&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:21:03&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved.

Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you are experiencing any ongoing issues..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Wed, 1 Oct 2025 10:55:21 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmg7ve8no0nfm10wclsomjvsg</link>
  <guid>https://cl.instatus.com/incident/cmg7ve8no0nfm10wclsomjvsg</guid>
</item>

<item>
  <title>West Cambridge Data Centre full outage for planned electrical works</title>
  <description>
    Type: Maintenance
    Duration: 1 day, 16 hours and 38 minutes

    Affected Components: Archive Server, Data Replication, WCDC, Other Internal Services, Main VM Pool (WCDC), Other Secondary Storage Systems, , 
Data Storage →
    Sep 18, 13:46:40 GMT+0 - Identified - UIS has announced that the planned electrical works in WCDC have completed early. We are starting to restore our services now. This will take time to complete. Sep 17, 16:00:00 GMT+0 - Identified - **This maintenance will not start until 17 September. Our status page platform has sent out a couple of incorrect emails about the timing of this maintenance; apologies for the confusion.**

---

UIS have notified us that essential electrical works will be carried out at the West Cambridge Data Centre (WCDC) on 18 September 09:00-17:00\. The entire data centre will be switched off all day.

This will affect many departmental services (several of our servers are hosted in WCDC) as well as other services in the broader University. Many departmental services will be shut down on the evening of 17 September and will remain offline until at least the evening of 18 September.

We are planning the precise departmental impact and possible mitigations, and will add more information to this page in due course.

We know however that most departmental-hosted VMs (except GPU VMs) and departmental administrative systems (for example dbwebserver and the underlyng databases) will be shut down for the duration. Sep 17, 16:00:00 GMT+0 - Identified - **This maintenance will not start until 17 September. Our status page platform has sent out a couple of incorrect emails about the timing of this maintenance; apologies for the confusion.**

---

As previously announced, UIS have notified us that essential electrical works will be carried out at the West Cambridge Data Centre (WCDC) on 18 September 09:00-17:00\. The entire data centre will be switched off all day.

This will affect many departmental services (several of our servers are hosted in WCDC) as well as other services in the broader University. Many departmental services will be shut down on the evening of 17 September and will remain offline until at least the evening of 18 September.

**More details on the precise impact:**

The following departmental systems will be unavailable:

* Departmental databases: dbwebserver, svr-win-db
* All research and teaching VMs, except dev-gpu-\* and dev-cpu-\*
* SSH servers: ely, svr-ssh-0 (use slogin.cl.cam.ac.uk instead which will be changed to point at svr-ssh-1, hosted elsewhere)
* Webadmin
* Verex
* Archive servers (archive / berilia; archive-smb / jerakeen)
* Licence servers: lmserv-\*
* Cron servers (cron-serv\*)
* Weather station
* Undergraduate SSH servers: cl-student-ssh, cl-teaching-ecad
* WSUS
* Misc utility websites on svr-www-02, svr-www-03, www-dyn\*
* Legacy Windows remote desktop service (clrds / desktop)
* Legacy Subversion server (svn1)
* Legacy printing (CUPS, mDNS/Bonjour)
* Legacy VPL servers
* Legacy wiki server
* Morello: entire cluster
* EEG: beara, gola
* SRCF: egress, echo, enid
* TFC: tfc-app{1,2,4,10,11}

UIS have told us that the following University-wide services will be down all day on 18 September:

* The University Finance System (UFS)
* Cognos
* The Research Dashboard
* Tableau
* Research Computing Services will be unavailable from approximately 10:00 on 17 September until 17:00 on 19 September. This includes:  
   * Cambridge Research Cloud (Arcus)  
   * Cambridge Service for Data Driven Discovery (CSD3)  
   * Dawn  
   * Research Cold Store (RCS)  
   * Research Data Store (RDS)  
   * Research File Store (RFS)  
   * Secure Research Computing Platform (SRCP)

The following departmental systems will be operating **without resilience** and are at risk of disruption:

* VPN2
* Active Directory (adsrv07)
* DNS (authoritative and recursive)
* LDAP
* DHCP
* MSA, MTA (SMTP outbound)
* TGT servers
* Legacy email (on filer / using .forward files)
* cl-onserver / laira, march
* Filer disaster-recovery snapshots

The following departmental systems will have their outage mitigated, i.e. they will be rehomed in the William Gates Building prior to the outage:

* Request Tracker
* www.cl.cam.ac.uk, sysdata.cl.cam.ac.uk (svr-www-00, svr-www-01)
* Xen Orchestra Sep 18, 17:27:07 GMT+0 - Identified - We are aware that some research/teaching virtual machines belonging to individuals and research groups have not automatically started. These are VMs that are not configured to automatically start, so we are not sure whether they are actually meant to be running. If your VM has not started and should be running continuously, contact us and we&#039;ll rectify this for next time. Sep 19, 08:50:42 GMT+0 - Completed - Maintenance was completed successfully. Sep 18, 14:10:46 GMT+0 - Identified - Departmental IT services have been restored. Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you encounter any remaining problems with these.

University-wide services are still being restored, per UIS&#039;s announced schedule, and may not be fully available again until tomorrow.

Some further work is needed to undo some temporary changes to departmental infrastructure, which will take place out-of-hours. Sep 17, 16:12:37 GMT+0 - Identified - Maintenance is now in progress. Systems will gradually shut down over the course of this evening, and will not come back up until Thursday evening at the soonest.

We will start with some relatively low-impact network infrastructure firmware upgrades (which will cause intermittent loss of connectivity to Morello systems in WCDC) and storage software updates, and will then proceed to shut down our VM infrastructure. Sep 18, 12:40:25 GMT+0 - Identified - Power was unexpectedly restored around 13:10 before the maintenance was announced as complete. This caused many systems to automatically start again. For safety, we are shutting affected systems down again now in case power is again lost. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 day, 16 hours and 38 minutes</p>
    <p><strong>Affected Components:</strong> , , , , , , </p>
    &lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:46:40&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  UIS has announced that the planned electrical works in WCDC have completed early. We are starting to restore our services now. This will take time to complete..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  **This maintenance will not start until 17 September. Our status page platform has sent out a couple of incorrect emails about the timing of this maintenance; apologies for the confusion.**

---

UIS have notified us that essential electrical works will be carried out at the West Cambridge Data Centre (WCDC) on 18 September 09:00-17:00\. The entire data centre will be switched off all day.

This will affect many departmental services (several of our servers are hosted in WCDC) as well as other services in the broader University. Many departmental services will be shut down on the evening of 17 September and will remain offline until at least the evening of 18 September.

We are planning the precise departmental impact and possible mitigations, and will add more information to this page in due course.

We know however that most departmental-hosted VMs (except GPU VMs) and departmental administrative systems (for example dbwebserver and the underlyng databases) will be shut down for the duration..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  **This maintenance will not start until 17 September. Our status page platform has sent out a couple of incorrect emails about the timing of this maintenance; apologies for the confusion.**

---

As previously announced, UIS have notified us that essential electrical works will be carried out at the West Cambridge Data Centre (WCDC) on 18 September 09:00-17:00\. The entire data centre will be switched off all day.

This will affect many departmental services (several of our servers are hosted in WCDC) as well as other services in the broader University. Many departmental services will be shut down on the evening of 17 September and will remain offline until at least the evening of 18 September.

**More details on the precise impact:**

The following departmental systems will be unavailable:

* Departmental databases: dbwebserver, svr-win-db
* All research and teaching VMs, except dev-gpu-\* and dev-cpu-\*
* SSH servers: ely, svr-ssh-0 (use slogin.cl.cam.ac.uk instead which will be changed to point at svr-ssh-1, hosted elsewhere)
* Webadmin
* Verex
* Archive servers (archive / berilia; archive-smb / jerakeen)
* Licence servers: lmserv-\*
* Cron servers (cron-serv\*)
* Weather station
* Undergraduate SSH servers: cl-student-ssh, cl-teaching-ecad
* WSUS
* Misc utility websites on svr-www-02, svr-www-03, www-dyn\*
* Legacy Windows remote desktop service (clrds / desktop)
* Legacy Subversion server (svn1)
* Legacy printing (CUPS, mDNS/Bonjour)
* Legacy VPL servers
* Legacy wiki server
* Morello: entire cluster
* EEG: beara, gola
* SRCF: egress, echo, enid
* TFC: tfc-app{1,2,4,10,11}

UIS have told us that the following University-wide services will be down all day on 18 September:

* The University Finance System (UFS)
* Cognos
* The Research Dashboard
* Tableau
* Research Computing Services will be unavailable from approximately 10:00 on 17 September until 17:00 on 19 September. This includes:  
   * Cambridge Research Cloud (Arcus)  
   * Cambridge Service for Data Driven Discovery (CSD3)  
   * Dawn  
   * Research Cold Store (RCS)  
   * Research Data Store (RDS)  
   * Research File Store (RFS)  
   * Secure Research Computing Platform (SRCP)

The following departmental systems will be operating **without resilience** and are at risk of disruption:

* VPN2
* Active Directory (adsrv07)
* DNS (authoritative and recursive)
* LDAP
* DHCP
* MSA, MTA (SMTP outbound)
* TGT servers
* Legacy email (on filer / using .forward files)
* cl-onserver / laira, march
* Filer disaster-recovery snapshots

The following departmental systems will have their outage mitigated, i.e. they will be rehomed in the William Gates Building prior to the outage:

* Request Tracker
* www.cl.cam.ac.uk, sysdata.cl.cam.ac.uk (svr-www-00, svr-www-01)
* Xen Orchestra.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:27:07&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We are aware that some research/teaching virtual machines belonging to individuals and research groups have not automatically started. These are VMs that are not configured to automatically start, so we are not sure whether they are actually meant to be running. If your VM has not started and should be running continuously, contact us and we&#039;ll rectify this for next time..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;08:50:42&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance was completed successfully..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:10:46&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Departmental IT services have been restored. Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you encounter any remaining problems with these.

University-wide services are still being restored, per UIS&#039;s announced schedule, and may not be fully available again until tomorrow.

Some further work is needed to undo some temporary changes to departmental infrastructure, which will take place out-of-hours..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:12:37&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress. Systems will gradually shut down over the course of this evening, and will not come back up until Thursday evening at the soonest.

We will start with some relatively low-impact network infrastructure firmware upgrades (which will cause intermittent loss of connectivity to Morello systems in WCDC) and storage software updates, and will then proceed to shut down our VM infrastructure..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Sep &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:40:25&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Power was unexpectedly restored around 13:10 before the maintenance was announced as complete. This caused many systems to automatically start again. For safety, we are shutting affected systems down again now in case power is again lost..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Wed, 17 Sep 2025 16:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmdhk81a1000i10bbeazohlof</link>
  <guid>https://cl.instatus.com/maintenance/cmdhk81a1000i10bbeazohlof</guid>
</item>

<item>
  <title>Various services inaccessible from the internet</title>
  <description>
    Type: Incident
    Duration: 6 hours and 2 minutes

    Affected Components: Network
    Jul 15, 11:14:29 GMT+0 - Identified - Since UIS&#039;s network maintenance this morning, they appear to have blocked some connections into the departmental network that had previously been allowed. For example, the CL MSA is currently unreachable from the internet so depending on your configuration you may be unable to send mail, and DNS servers on our network are unreachable. We are querying this urgently with UIS. Jul 15, 11:27:06 GMT+0 - Identified - This outage is also preventing email from reaching departmental request-tracker services including the IT service desk, Building Services, Purchasing etc.. Jul 15, 11:51:39 GMT+0 - Identified - Mail to the small number of users still on the legacy filer-based email platform (i.e. using \~/.forward to route email) is also currently not working as a result of this incident. This affects: djg11 ejb1 fhk1 jac22 jmb25 km10 pb22 pes20 rnc1

Though we hope that UIS will rectify this for us soon, note that the legacy email platform will not last for much longer anyway and these people should plan their migration to another system. We can upon request reconfigure your email to forward directly to another address, for example your @cam.ac.uk Exchange Online mailbox, bypassing your .forward. Pending restoration of the IT service desk, please contact [mas90@cam.ac.uk](mailto:mas90@cam.ac.uk) from your @cam.ac.uk address or from a known external email address (as your @cl/@cst address will not be working) if you would like to discuss this. Jul 15, 14:32:49 GMT+0 - Identified - UIS are working on fixing this for us. Some services that were being blocked by UIS have now been repaired. However inbound email to departmental systems such as Request Tracker and the legacy email platform is still blocked. Jul 15, 15:18:58 GMT+0 - Monitoring - UIS implemented a fix and we are currently monitoring the result. Please report to [service-desk@cl.cam.ac.uk](mailto:service-desk@cl.cam.ac.uk) CC [mas90@cam.ac.uk](mailto:mas90@cam.ac.uk) if you are aware of any remaining networking issues or unavailable services. Jul 15, 17:16:09 GMT+0 - Resolved - We believe this incident has been resolved. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 6 hours and 2 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:14:29&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Since UIS&#039;s network maintenance this morning, they appear to have blocked some connections into the departmental network that had previously been allowed. For example, the CL MSA is currently unreachable from the internet so depending on your configuration you may be unable to send mail, and DNS servers on our network are unreachable. We are querying this urgently with UIS..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:27:06&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  This outage is also preventing email from reaching departmental request-tracker services including the IT service desk, Building Services, Purchasing etc...&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:51:39&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Mail to the small number of users still on the legacy filer-based email platform (i.e. using \~/.forward to route email) is also currently not working as a result of this incident. This affects: djg11 ejb1 fhk1 jac22 jmb25 km10 pb22 pes20 rnc1

Though we hope that UIS will rectify this for us soon, note that the legacy email platform will not last for much longer anyway and these people should plan their migration to another system. We can upon request reconfigure your email to forward directly to another address, for example your @cam.ac.uk Exchange Online mailbox, bypassing your .forward. Pending restoration of the IT service desk, please contact [mas90@cam.ac.uk](mailto:mas90@cam.ac.uk) from your @cam.ac.uk address or from a known external email address (as your @cl/@cst address will not be working) if you would like to discuss this..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:32:49&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  UIS are working on fixing this for us. Some services that were being blocked by UIS have now been repaired. However inbound email to departmental systems such as Request Tracker and the legacy email platform is still blocked..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:18:58&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  UIS implemented a fix and we are currently monitoring the result. Please report to [service-desk@cl.cam.ac.uk](mailto:service-desk@cl.cam.ac.uk) CC [mas90@cam.ac.uk](mailto:mas90@cam.ac.uk) if you are aware of any remaining networking issues or unavailable services..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:16:09&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  We believe this incident has been resolved..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 15 Jul 2025 11:14:29 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmd4fqdig009v11wctuvd0zxd</link>
  <guid>https://cl.instatus.com/incident/cmd4fqdig009v11wctuvd0zxd</guid>
</item>

<item>
  <title>University network maintenance: complete internet outage</title>
  <description>
    Type: Maintenance
    Duration: 9 hours and 29 minutes

    Affected Components: External Services, Network
    Jul 15, 06:00:00 GMT+0 - Identified - UIS have announced network maintenance on 15th-16th July which will result in the entire University having no internet connection for at least an hour starting at 7am on 15th July. They aim to have the work completed by 9am on 15th July but may overrun.

Anyone working **locally**, for example in their office, **will** experience disruption, as the internet and cloud services will be inaccessible. Many on-premises systems will also be inaccessible as they rely on cloud services for some functions such as authentication of user accounts. This will affect all wired and wifi connections in the William Gates Building, including eduroam and wgb.

Anyone working **remotely** and trying to connect to on-premises University services (such as CHRIS, CUFS and any cl.cam.ac.uk system such as filer, slogin or an office computer) **will** experience disruption, as University servers will be inaccessible from the internet.

Anyone working remotely but only connecting to cloud services (for example, Microsoft applications such as Teams, SharePoint, Office 365, Exchange/Outlook) should not experience any disruption.

The full announcement from UIS follows:

&quot;We will be undertaking work to update and enhance our network service, equipment and security posture on Tuesday 15 July, between 07:00 and 09:00\. We’re also reserving 16 July for any additional work we need to complete, but we aim to be finished on 15 July.

There will be some disruption to network connectivity between the University Data Network (UDN) and the internet during this work. Users should plan to avoid critical work on the network during the maintenance period.

This work includes the network maintenance that we postponed in December 2024.

What’s happening

We plan to change the network infrastructure between Janet and the UDN. This will include new border routers, network address translation (NAT) infrastructure, and a replacement for the intrusion prevention system (IPS). This will cause disruption to network connectivity between the UDN and the internet. 

There will be disruption for some users between 07:00 and 09:00 on 15 July, as follows:

* Users working **remotely** and connecting to cloud services (for example, Microsoft applications such as Teams) should not experience any disruption.
* Users working **remotely** and connecting to on-premise University services (such as CHRIS, CUFS) **will** experience disruption.
* Users working on the University network (for example, working in the office) **will** experience disruption to the internet and cloud services.
* Connectivity within the University network, (for example, working in an office and connecting to a University service such as CHRIS) should not experience any disruption.

There will be changes to the way the central NAT service is configured. [Read further details about our NAT service](https://help.uis.cam.ac.uk/service/network-services/datanetwork/nat).

What you should do 

Users should plan to avoid critical work on the network during 07:00 to 09:00 on the morning of 15 July. This includes hybrid meetings on the University estate. IT officers should advise their users of this disruption. 

We will issue a reminder about this work a week before it is scheduled to take place.

Any issues? 

If you have any queries about this work, please [contact the Service Desk](https://help.uis.cam.ac.uk/contact-us).&quot;

(Note that for the Department of Computer Science and Technology, the changes to the NAT service only affect eduroam.) Jul 15, 15:28:59 GMT+0 - Completed - Maintenance has completed successfully. Jul 15, 06:00:01 GMT+0 - Identified - Maintenance is now in progress 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 9 hours and 29 minutes</p>
    <p><strong>Affected Components:</strong> , </p>
    &lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;06:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  UIS have announced network maintenance on 15th-16th July which will result in the entire University having no internet connection for at least an hour starting at 7am on 15th July. They aim to have the work completed by 9am on 15th July but may overrun.

Anyone working **locally**, for example in their office, **will** experience disruption, as the internet and cloud services will be inaccessible. Many on-premises systems will also be inaccessible as they rely on cloud services for some functions such as authentication of user accounts. This will affect all wired and wifi connections in the William Gates Building, including eduroam and wgb.

Anyone working **remotely** and trying to connect to on-premises University services (such as CHRIS, CUFS and any cl.cam.ac.uk system such as filer, slogin or an office computer) **will** experience disruption, as University servers will be inaccessible from the internet.

Anyone working remotely but only connecting to cloud services (for example, Microsoft applications such as Teams, SharePoint, Office 365, Exchange/Outlook) should not experience any disruption.

The full announcement from UIS follows:

&quot;We will be undertaking work to update and enhance our network service, equipment and security posture on Tuesday 15 July, between 07:00 and 09:00\. We’re also reserving 16 July for any additional work we need to complete, but we aim to be finished on 15 July.

There will be some disruption to network connectivity between the University Data Network (UDN) and the internet during this work. Users should plan to avoid critical work on the network during the maintenance period.

This work includes the network maintenance that we postponed in December 2024.

What’s happening

We plan to change the network infrastructure between Janet and the UDN. This will include new border routers, network address translation (NAT) infrastructure, and a replacement for the intrusion prevention system (IPS). This will cause disruption to network connectivity between the UDN and the internet. 

There will be disruption for some users between 07:00 and 09:00 on 15 July, as follows:

* Users working **remotely** and connecting to cloud services (for example, Microsoft applications such as Teams) should not experience any disruption.
* Users working **remotely** and connecting to on-premise University services (such as CHRIS, CUFS) **will** experience disruption.
* Users working on the University network (for example, working in the office) **will** experience disruption to the internet and cloud services.
* Connectivity within the University network, (for example, working in an office and connecting to a University service such as CHRIS) should not experience any disruption.

There will be changes to the way the central NAT service is configured. [Read further details about our NAT service](https://help.uis.cam.ac.uk/service/network-services/datanetwork/nat).

What you should do 

Users should plan to avoid critical work on the network during 07:00 to 09:00 on the morning of 15 July. This includes hybrid meetings on the University estate. IT officers should advise their users of this disruption. 

We will issue a reminder about this work a week before it is scheduled to take place.

Any issues? 

If you have any queries about this work, please [contact the Service Desk](https://help.uis.cam.ac.uk/contact-us).&quot;

(Note that for the Department of Computer Science and Technology, the changes to the NAT service only affect eduroam.).&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:28:59&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;06:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 15 Jul 2025 06:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmcx9jhjo000b14oecoy40yxm</link>
  <guid>https://cl.instatus.com/maintenance/cmcx9jhjo000b14oecoy40yxm</guid>
</item>

<item>
  <title>GPU VM cluster maintenance</title>
  <description>
    Type: Maintenance
    Duration: 1 hour and 50 minutes

    Affected Components: GPUs
    Jun 22, 13:00:01 GMT+0 - Identified - Maintenance is now in progress Jun 22, 13:51:24 GMT+0 - Identified - Storage server maintenance is complete. The shared server dev-gpu-2 is coming back up. VM hypervisor upgrades are now beginning so personal dev-\* VMs will remain down. Jun 22, 14:50:15 GMT+0 - Completed - Maintenance has completed successfully. Jun 22, 14:33:47 GMT+0 - Identified - Some capacity to run user VMs is now back online. You may try to start your VM again via Xen Orchestra if you need it. If it fails to start, try again after half an hour when there should be more capacity available. Jun 22, 13:00:00 GMT+0 - Identified - The GPU VM cluster which hosts dev-gpu-\* and dev-cpu-\* virtual machines, and the associated storage server, requires some urgent software updates and hardware maintenance in order to rectify a couple of known problems. We propose to do this on Sunday; however this could be rescheduled if this would be particularly disruptive (contact [mas90](mailto:mas90@cl.cam.ac.uk) ASAP if so).

All dev-gpu-\* and dev-cpu-\* VMs plus the shared servers dev-gpu-1, dev-gpu-2, dev-cpu-1, dev-gpu-acs and dev-cpu-acs must be shut down during this maintenance, as the storage server that holds VM disks and home directories will be unavailable for a short time. Capacity to host VMs will gradually be restored during the maintenance as each VM host is updated and brought back online. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 hour and 50 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 22&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 22&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:51:24&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Storage server maintenance is complete. The shared server dev-gpu-2 is coming back up. VM hypervisor upgrades are now beginning so personal dev-\* VMs will remain down..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 22&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:50:15&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 22&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:33:47&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Some capacity to run user VMs is now back online. You may try to start your VM again via Xen Orchestra if you need it. If it fails to start, try again after half an hour when there should be more capacity available..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 22&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The GPU VM cluster which hosts dev-gpu-\* and dev-cpu-\* virtual machines, and the associated storage server, requires some urgent software updates and hardware maintenance in order to rectify a couple of known problems. We propose to do this on Sunday; however this could be rescheduled if this would be particularly disruptive (contact [mas90](mailto:mas90@cl.cam.ac.uk) ASAP if so).

All dev-gpu-\* and dev-cpu-\* VMs plus the shared servers dev-gpu-1, dev-gpu-2, dev-cpu-1, dev-gpu-acs and dev-cpu-acs must be shut down during this maintenance, as the storage server that holds VM disks and home directories will be unavailable for a short time. Capacity to host VMs will gradually be restored during the maintenance as each VM host is updated and brought back online..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sun, 22 Jun 2025 13:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmc3uxr73001kcp7c4wj0ss36</link>
  <guid>https://cl.instatus.com/maintenance/cmc3uxr73001kcp7c4wj0ss36</guid>
</item>

<item>
  <title>VM storage server repair (xene-pool1)</title>
  <description>
    Type: Maintenance
    Duration: 49 minutes

    Affected Components: Request Tracker, Main VM Pool (WCDC), Other Internal Services
    Jun 17, 16:00:01 GMT+0 - Identified - Maintenance is now in progress Jun 17, 16:00:00 GMT+0 - Identified - Following on from the [earlier unscheduled VM storage outage](https://cl.instatus.com/cmbyy60fb0019oku8gwzg9j11), we need to replace a failed memory module in the storage server that backs one of our departmental VM pools in order to restore performance and reliablity.

This requires us to shut down all VMs on xene-pool1, which will affect the following departmental services:

* cl-student-ssh - Undergraduate SSH server
* MSA (partial outage, one of two servers affected)
* Request Tracker
* VPN2 (partial outage, one of two servers affected and new connections are already steered towards the other server)
* Departmental database server (SQL Server / svr-win-db / db-\*)
* Windows Remote Desktop service
* dbwebserver
* WSUS (Windows Updates)

And it will affect the following user VMs:

* cl-teaching-ecad
* dev-compilers0
* egress
* knot
* lmserv-mentor
* svr-papers
* svr-www-ecad
* svr-yg386-web

These will be shut down soon after 5pm and will remain off for approximately an hour. The at-risk window is given as 2.5 hours due to uncertainty with the exact timing.

We will take the opportunity to do some routine maintenance (software and firmware updates) of the storage system at the same time, in order to avoid a future need to do more scheduled maintenance. Jun 17, 16:49:24 GMT+0 - Completed - Maintenance has completed successfully. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 49 minutes</p>
    <p><strong>Affected Components:</strong> , , </p>
    &lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Following on from the [earlier unscheduled VM storage outage](https://cl.instatus.com/cmbyy60fb0019oku8gwzg9j11), we need to replace a failed memory module in the storage server that backs one of our departmental VM pools in order to restore performance and reliablity.

This requires us to shut down all VMs on xene-pool1, which will affect the following departmental services:

* cl-student-ssh - Undergraduate SSH server
* MSA (partial outage, one of two servers affected)
* Request Tracker
* VPN2 (partial outage, one of two servers affected and new connections are already steered towards the other server)
* Departmental database server (SQL Server / svr-win-db / db-\*)
* Windows Remote Desktop service
* dbwebserver
* WSUS (Windows Updates)

And it will affect the following user VMs:

* cl-teaching-ecad
* dev-compilers0
* egress
* knot
* lmserv-mentor
* svr-papers
* svr-www-ecad
* svr-yg386-web

These will be shut down soon after 5pm and will remain off for approximately an hour. The at-risk window is given as 2.5 hours due to uncertainty with the exact timing.

We will take the opportunity to do some routine maintenance (software and firmware updates) of the storage system at the same time, in order to avoid a future need to do more scheduled maintenance..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:49:24&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 17 Jun 2025 16:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cmbz79mj40003vb2hngvz4kvd</link>
  <guid>https://cl.instatus.com/maintenance/cmbz79mj40003vb2hngvz4kvd</guid>
</item>

<item>
  <title>VM storage fault (xene-pool1)</title>
  <description>
    Type: Incident
    Duration: 10 hours and 47 minutes

    Affected Components: Request Tracker, Main VM Pool (WCDC), Other Internal Services
    Jun 16, 10:53:00 GMT+0 - Monitoring - The fault has been mitigated and affected VMs are now back online. The VMs will have to be shut down again within a few days to replace a failed hardware component. Some users connected to VPN2 may be disconnected shortly as one of the VPN gateway servers needs rebooting even though it is still partially working. Besides this, please contact \[service-desk@cst.cam.ac.uk\](mailto:service-desk@cst.cam.ac.uk) in case of any remaining problems. Jun 16, 01:10:00 GMT+0 - Investigating - Overnight a hardware fault took down the storage server that backs one of our main departmental VM pools (xene-pool1). All VMs running on that pool failed, which included the departmental database server, dbwebserver, Request Tracker, cl-student-ssh, part of the MSA service and the Windows Remote Desktop service. Jun 16, 11:57:27 GMT+0 - Resolved - This incident has been resolved. However the same VMs will need to be shut down when a replacement part arrives. This will be communicated separately. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 10 hours and 47 minutes</p>
    <p><strong>Affected Components:</strong> , , </p>
    &lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:53:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  The fault has been mitigated and affected VMs are now back online. The VMs will have to be shut down again within a few days to replace a failed hardware component. Some users connected to VPN2 may be disconnected shortly as one of the VPN gateway servers needs rebooting even though it is still partially working. Besides this, please contact \[service-desk@cst.cam.ac.uk\](mailto:service-desk@cst.cam.ac.uk) in case of any remaining problems..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;01:10:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  Overnight a hardware fault took down the storage server that backs one of our main departmental VM pools (xene-pool1). All VMs running on that pool failed, which included the departmental database server, dbwebserver, Request Tracker, cl-student-ssh, part of the MSA service and the Windows Remote Desktop service..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:57:27&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved. However the same VMs will need to be shut down when a replacement part arrives. This will be communicated separately..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 16 Jun 2025 01:10:00 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmbyy60fb0019oku8gwzg9j11</link>
  <guid>https://cl.instatus.com/incident/cmbyy60fb0019oku8gwzg9j11</guid>
</item>

<item>
  <title>Legacy CUPS printing from Macs disrupted</title>
  <description>
    Type: Incident
    Duration: 2 hours and 16 minutes

    Affected Components: Other Internal Services
    May 28, 14:50:21 GMT+0 - Resolved - This incident has been resolved. May 28, 12:34:32 GMT+0 - Investigating - We are investigating reports that printing to legacy printers from Macs is currently disrupted, possibly due to a Bonjour problem. We suggest using [DS-Print](https://www.cst.cam.ac.uk/local/sys/printers/ds-print) as a workaround. May 28, 12:57:32 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result.

Nevertheless we suggest that you get DS-Print set up on your devices anyway, as this system will fully replace the legacy CUPS server soon. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 2 hours and 16 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 28&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:50:21&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 28&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:34:32&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are investigating reports that printing to legacy printers from Macs is currently disrupted, possibly due to a Bonjour problem. We suggest using [DS-Print](https://www.cst.cam.ac.uk/local/sys/printers/ds-print) as a workaround..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 28&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:57:32&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  We implemented a fix and are currently monitoring the result.

Nevertheless we suggest that you get DS-Print set up on your devices anyway, as this system will fully replace the legacy CUPS server soon..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Wed, 28 May 2025 12:34:32 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmb7xgf4o0007ynpfj0d3cvx0</link>
  <guid>https://cl.instatus.com/incident/cmb7xgf4o0007ynpfj0d3cvx0</guid>
</item>

<item>
  <title>VM storage disruption (sxp32)</title>
  <description>
    Type: Incident
    Duration: 1 hour and 14 minutes

    Affected Components: Main VM Pool (WCDC)
    May 26, 16:01:00 GMT+0 - Identified - Virtual machines running on one of of our legacy VM hosts (sxp32) have experienced a storage disruption, currently under investigation. Affected VMs will be rebooted and may need some filesystem repairs. May 26, 17:14:51 GMT+0 - Resolved - This incident has been resolved. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 1 hour and 14 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:01:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Virtual machines running on one of of our legacy VM hosts (sxp32) have experienced a storage disruption, currently under investigation. Affected VMs will be rebooted and may need some filesystem repairs..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:14:51&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 26 May 2025 16:01:00 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cmb5brumm000cb1rcl4fw89us</link>
  <guid>https://cl.instatus.com/incident/cmb5brumm000cb1rcl4fw89us</guid>
</item>

<item>
  <title>archive-smb outage: hardware fault</title>
  <description>
    Type: Incident
    Duration: 10 hours and 8 minutes

    Affected Components: Archive Server
    Mar 27, 11:54:14 GMT+0 - Investigating - Since the [West Cambridge Data Centre electrical fault](https://cl.instatus.com/cm8rab7xk000mit1owze0h661), a component in jerakeen/archive-smb (the &quot;new&quot; archive server, currently hosting all SMB/CIFS volumes plus a couple of NFS volumes) has failed. We are investigating. Mar 27, 22:02:15 GMT+0 - Resolved - We have implemented a workaround and have brought archive-smb back into service, with reduced resilience pending replacement of a failed system SSD. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 10 hours and 8 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:54:14&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  Since the [West Cambridge Data Centre electrical fault](https://cl.instatus.com/cm8rab7xk000mit1owze0h661), a component in jerakeen/archive-smb (the &quot;new&quot; archive server, currently hosting all SMB/CIFS volumes plus a couple of NFS volumes) has failed. We are investigating..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;22:02:15&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  We have implemented a workaround and have brought archive-smb back into service, with reduced resilience pending replacement of a failed system SSD..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 27 Mar 2025 11:54:14 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm8raptl9001w7s0rw9ruet8j</link>
  <guid>https://cl.instatus.com/incident/cm8raptl9001w7s0rw9ruet8j</guid>
</item>

<item>
  <title>Major incident: West Cambridge Data Centre electrical outage</title>
  <description>
    Type: Incident
    Duration: 11 hours and 13 minutes

    Affected Components: WCDC
    Mar 27, 22:02:42 GMT+0 - Resolved - This incident has been resolved. Mar 27, 11:55:37 GMT+0 - Monitoring - We observe that both power feeds in WCDC have now been restored. However as we have had no information from UIS about this incident, we do not yet know whether power can be considered stable.

The archive-smb outage is ongoing and [tracked in a separate incident](https://cl.instatus.com/cm8raptl9001w7s0rw9ruet8j). We believe that all other departmental systems are working again. Mar 27, 10:50:00 GMT+0 - Identified - We observe that our equipment in the West Cambridge Data Centre lost power (both redundant feeds) around 10:50\. Power has been partially restored (one feed) and most departmental systems are back online. However there are ongoing outages affecting multiple other University systems and the University Data Network.

archive-smb is still down and this is being investigated.

If any systems (in particular virtual machines) did not automatically start and are needed, please start them via &lt;https://xo.cl.cam.ac.uk/&gt; or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) . 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 11 hours and 13 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;22:02:42&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:55:37&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  We observe that both power feeds in WCDC have now been restored. However as we have had no information from UIS about this incident, we do not yet know whether power can be considered stable.

The archive-smb outage is ongoing and [tracked in a separate incident](https://cl.instatus.com/cm8raptl9001w7s0rw9ruet8j). We believe that all other departmental systems are working again..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 27&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:50:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We observe that our equipment in the West Cambridge Data Centre lost power (both redundant feeds) around 10:50\. Power has been partially restored (one feed) and most departmental systems are back online. However there are ongoing outages affecting multiple other University systems and the University Data Network.

archive-smb is still down and this is being investigated.

If any systems (in particular virtual machines) did not automatically start and are needed, please start them via &lt;https://xo.cl.cam.ac.uk/&gt; or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) ..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 27 Mar 2025 10:50:00 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm8rab7xk000mit1owze0h661</link>
  <guid>https://cl.instatus.com/incident/cm8rab7xk000mit1owze0h661</guid>
</item>

<item>
  <title>Chiller fault</title>
  <description>
    Type: Incident
    Duration: 5 hours and 52 minutes

    Affected Components: GN09
    Mar 7, 17:42:35 GMT+0 - Identified - Progress has been made; the chiller is running again but there is a problem still under investigation. We are hopeful that servers can be turned back on again today, but will await the all-clear from the chiller technician. Mar 7, 15:07:50 GMT+0 - Identified - The William Gates Building&#039;s chiller has a fault and has stopped running. Temperatures in our on-site data centre GN09 are rising rapidly. Engineers have been called out but it is likely that we will have to start shutting down servers in order to protect them. Mar 7, 17:03:57 GMT+0 - Identified - Most servers in GN09 are now off, and must remain off until further notice. The emergency technician has arrived and is investigating. Mar 7, 18:26:28 GMT+0 - Identified - Cooling has been restored and is expected to remain stable. The cause of the chiller shutting down was the chilled water circulation pumps stopping for some other reason, which will be investigated next week but which we expect to have been an isolated incident. The chiller still has one alarm present which is not preventing operation but is still being investigated.

We are taking the opportunity of GN09 being shut down to perform some routine firmware and software updates on network hardware and storage systems, so we will not start turning servers back on quite yet, but expect to be able to do so shortly. Mar 7, 21:00:10 GMT+0 - Resolved - This incident has been resolved. GN09 is fully operational. Most servers that were previously running have been restarted.

If you have a physical server that is not running, you may be able to start it yourself via &lt;https://console.caelum.cl.cam.ac.uk&gt; as usual, or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk).

VMs that were not set to start automatically have not been restarted. You can start VMs when you need them via &lt;https://xo.cl.cam.ac.uk&gt; as usual.

Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if there are any remaining issues. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 5 hours and 52 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:42:35&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Progress has been made; the chiller is running again but there is a problem still under investigation. We are hopeful that servers can be turned back on again today, but will await the all-clear from the chiller technician..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:07:50&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The William Gates Building&#039;s chiller has a fault and has stopped running. Temperatures in our on-site data centre GN09 are rising rapidly. Engineers have been called out but it is likely that we will have to start shutting down servers in order to protect them..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:03:57&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Most servers in GN09 are now off, and must remain off until further notice. The emergency technician has arrived and is investigating..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:26:28&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Cooling has been restored and is expected to remain stable. The cause of the chiller shutting down was the chilled water circulation pumps stopping for some other reason, which will be investigated next week but which we expect to have been an isolated incident. The chiller still has one alarm present which is not preventing operation but is still being investigated.

We are taking the opportunity of GN09 being shut down to perform some routine firmware and software updates on network hardware and storage systems, so we will not start turning servers back on quite yet, but expect to be able to do so shortly..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;21:00:10&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved. GN09 is fully operational. Most servers that were previously running have been restarted.

If you have a physical server that is not running, you may be able to start it yourself via &lt;https://console.caelum.cl.cam.ac.uk&gt; as usual, or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk).

VMs that were not set to start automatically have not been restarted. You can start VMs when you need them via &lt;https://xo.cl.cam.ac.uk&gt; as usual.

Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if there are any remaining issues..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 7 Mar 2025 15:07:50 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm7ywtq5n0001z9i5cohx5dj7</link>
  <guid>https://cl.instatus.com/incident/cm7ywtq5n0001z9i5cohx5dj7</guid>
</item>

<item>
  <title>WC2D switch failure</title>
  <description>
    Type: Incident
    Duration: 1 hour and 18 minutes

    Affected Components: Network
    Mar 7, 05:16:25 GMT+0 - Identified - A network switch serving some users on the second floor of the William Gates Building (particularly around SC corridor, affecting both wired and wifi connections) has failed. We are working to rectify this. Mar 7, 06:34:15 GMT+0 - Resolved - This incident has been resolved. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 1 hour and 18 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;05:16:25&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  A network switch serving some users on the second floor of the William Gates Building (particularly around SC corridor, affecting both wired and wifi connections) has failed. We are working to rectify this..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Mar &lt;var data-var=&#039;date&#039;&gt; 7&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;06:34:15&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 7 Mar 2025 05:16:25 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm7ybp514001jup0uudq6h8kk</link>
  <guid>https://cl.instatus.com/incident/cm7ybp514001jup0uudq6h8kk</guid>
</item>

<item>
  <title>Caelum Console, vpnpassword and sshkeys.cl unavailable</title>
  <description>
    Type: Incident
    Duration: 1 hour and 34 minutes

    Affected Components: Caelum Console (server management), Other Internal Services
    Feb 26, 12:59:22 GMT+0 - Resolved - We have updated a component (uwsgi) of the web front end, which we hope will help with the deadlock problem. Feb 26, 11:25:49 GMT+0 - Investigating - The Caelum Console application for managing servers in datacentres, along with the VPN2 password application and SSH key management application, are currently unavailable. We are investigating. Feb 26, 11:55:11 GMT+0 - Monitoring - The server has been rebooted (and one bug fixed in the SSH key management backend) to restore service.

We are continuing to investigate a recurring problem that has caused the web frontends to stop responding a few times now. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 1 hour and 34 minutes</p>
    <p><strong>Affected Components:</strong> , </p>
    &lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:59:22&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  We have updated a component (uwsgi) of the web front end, which we hope will help with the deadlock problem..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:25:49&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  The Caelum Console application for managing servers in datacentres, along with the VPN2 password application and SSH key management application, are currently unavailable. We are investigating..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 26&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:55:11&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  The server has been rebooted (and one bug fixed in the SSH key management backend) to restore service.

We are continuing to investigate a recurring problem that has caused the web frontends to stop responding a few times now..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Wed, 26 Feb 2025 11:25:49 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm7ltxjgz0005llpizkur54r2</link>
  <guid>https://cl.instatus.com/incident/cm7ltxjgz0005llpizkur54r2</guid>
</item>

<item>
  <title>Delays to inbound email</title>
  <description>
    Type: Incident
    Duration: 43 minutes

    Affected Components: External Services
    Feb 13, 18:12:25 GMT+0 - Resolved - This incident has been resolved. Feb 13, 17:29:09 GMT+0 - Investigating - We are aware of delays affecting some email to @cl.cam.ac.uk and @cst.cam.ac.uk addresses. We suspect a problem at Forward Email. We don&#039;t think that any email will be lost, only delayed until the problem is fixed. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 43 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 13&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:12:25&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Feb &lt;var data-var=&#039;date&#039;&gt; 13&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:29:09&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are aware of delays affecting some email to @cl.cam.ac.uk and @cst.cam.ac.uk addresses. We suspect a problem at Forward Email. We don&#039;t think that any email will be lost, only delayed until the problem is fixed..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 13 Feb 2025 17:29:09 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm73m6qw2000cnldyo0m2oxvt</link>
  <guid>https://cl.instatus.com/incident/cm73m6qw2000cnldyo0m2oxvt</guid>
</item>

<item>
  <title>WCDC power distribution unit replacement</title>
  <description>
    Type: Maintenance
    Duration: 2 hours and 56 minutes

    Affected Components: WCDC, Other Internal Services
    Jan 30, 14:30:00 GMT+0 - Identified - We will be replacing a power distribution unit (PDU) in our core infrastructure rack in the West Cambridge Data Centre, which powers the 1Gbps switches and a small number of other infrastructure systems. No user impact is expected, except for the following cases:

* User servers tfc-app1, tfc-app2, tfc-app4 will lose networking for approximately half an hour
* Verex access control management (card access updates etc.) will be unavailable for approximately half an hour
* Minor delays in authenticating to Active Directory are possible, as one of the three domain controllers (adsrv07) will be turned off for approximately 45 minutes
* BMC and serial console access to other systems in WCDC will be unavailable for approximately 30 minutes

One of the two DHCP servers (sxp12) will also be turned off, but the other server should seamlessly handle all DHCP requests.

This work is not related to the [Estates electrical work](https://cl.instatus.com/cm6f6f1bm006d4w86h0j6sepu) happening in WCDC on the same day, but we have scheduled our work to take place during the same vulnerable period. Our PDU replacement will not reduce resilience any further. Jan 30, 14:30:01 GMT+0 - Identified - Maintenance is now in progress Jan 30, 17:25:33 GMT+0 - Completed - Maintenance has completed successfully. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 2 hours and 56 minutes</p>
    <p><strong>Affected Components:</strong> , </p>
    &lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 30&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We will be replacing a power distribution unit (PDU) in our core infrastructure rack in the West Cambridge Data Centre, which powers the 1Gbps switches and a small number of other infrastructure systems. No user impact is expected, except for the following cases:

* User servers tfc-app1, tfc-app2, tfc-app4 will lose networking for approximately half an hour
* Verex access control management (card access updates etc.) will be unavailable for approximately half an hour
* Minor delays in authenticating to Active Directory are possible, as one of the three domain controllers (adsrv07) will be turned off for approximately 45 minutes
* BMC and serial console access to other systems in WCDC will be unavailable for approximately 30 minutes

One of the two DHCP servers (sxp12) will also be turned off, but the other server should seamlessly handle all DHCP requests.

This work is not related to the [Estates electrical work](https://cl.instatus.com/cm6f6f1bm006d4w86h0j6sepu) happening in WCDC on the same day, but we have scheduled our work to take place during the same vulnerable period. Our PDU replacement will not reduce resilience any further..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 30&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:30:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 30&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:25:33&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 30 Jan 2025 14:30:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cm6fbm8y80043wcuyjjsus2xw</link>
  <guid>https://cl.instatus.com/maintenance/cm6fbm8y80043wcuyjjsus2xw</guid>
</item>

<item>
  <title>WCDC electrical survey: core infrastructure at risk</title>
  <description>
    Type: Maintenance
    Duration: 7 hours

    Affected Components: WCDC
    Jan 30, 16:00:00 GMT+0 - Completed - Maintenance has completed successfully Jan 30, 09:00:00 GMT+0 - Identified - Estates will be performing a survey of the electrical infrastructure in the West Cambridge Data Centre, which will require each of the two power feeds to our racks being individually powered down for short periods of time. All our equipment in these racks is redundantly powered from two feeds, and we do not anticipate any disruption; however we should consider our core infrastructure to be at risk during this work as we will be operating with no power redundancy; disruption would be possible during this work in the event of any problem with the data centre power infrastructure or with individual power supplies on our servers and network equipment. Jan 30, 09:00:01 GMT+0 - Identified - Maintenance is now in progress 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 7 hours</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 30&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 30&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;09:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Estates will be performing a survey of the electrical infrastructure in the West Cambridge Data Centre, which will require each of the two power feeds to our racks being individually powered down for short periods of time. All our equipment in these racks is redundantly powered from two feeds, and we do not anticipate any disruption; however we should consider our core infrastructure to be at risk during this work as we will be operating with no power redundancy; disruption would be possible during this work in the event of any problem with the data centre power infrastructure or with individual power supplies on our servers and network equipment..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 30&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;09:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 30 Jan 2025 09:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cm6f6f1bm006d4w86h0j6sepu</link>
  <guid>https://cl.instatus.com/maintenance/cm6f6f1bm006d4w86h0j6sepu</guid>
</item>

<item>
  <title>Mailing lists rejecting email</title>
  <description>
    Type: Incident
    Duration: 5 days, 3 hours and 25 minutes

    Affected Components: Other Internal Services
    Jan 17, 14:39:11 GMT+0 - Investigating - We are aware that UIS&#039;s mailing list service is rejecting some email sent from Exchange Online to University mailing lists via cl.cam.ac.uk/cst.cam.ac.uk aliases. We have asked UIS to investigate. Jan 17, 16:05:31 GMT+0 - Investigating - As a workaround, messages should get through if you send mail using Outlook from your @cam.ac.uk address to the relevant internal @lists.cam.ac.uk address for the mailing list; if you don&#039;t know what that address is for a particular list, contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk). Jan 20, 12:04:08 GMT+0 - Identified - We believe that UIS has successfully worked around this issue, and email sent to mailing lists from departmental addresses should now work.

However, we now also believe that this was a symptom of a broader problem with email to one particular email anti-spam service provider, Mimecast. Email to other institutions which also use Mimecast may also be affected. We are working on getting this resolved.

If you do encounter the issue, you may be able to get email through successfully by sending from an @cam.ac.uk address. Jan 22, 18:03:54 GMT+0 - Resolved - We believe that Mimecast has unblocked us.

There are some unrelated issues with some mailing lists still under investigation, not connected in any way (as far as we know) with the Mimecast problem; if you experience any more problems please contact service-desk@cst.cam.ac.uk. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 5 days, 3 hours and 25 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:39:11&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are aware that UIS&#039;s mailing list service is rejecting some email sent from Exchange Online to University mailing lists via cl.cam.ac.uk/cst.cam.ac.uk aliases. We have asked UIS to investigate..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:05:31&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  As a workaround, messages should get through if you send mail using Outlook from your @cam.ac.uk address to the relevant internal @lists.cam.ac.uk address for the mailing list; if you don&#039;t know what that address is for a particular list, contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk)..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:04:08&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We believe that UIS has successfully worked around this issue, and email sent to mailing lists from departmental addresses should now work.

However, we now also believe that this was a symptom of a broader problem with email to one particular email anti-spam service provider, Mimecast. Email to other institutions which also use Mimecast may also be affected. We are working on getting this resolved.

If you do encounter the issue, you may be able to get email through successfully by sending from an @cam.ac.uk address..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 22&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:03:54&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  We believe that Mimecast has unblocked us.

There are some unrelated issues with some mailing lists still under investigation, not connected in any way (as far as we know) with the Mimecast problem; if you experience any more problems please contact service-desk@cst.cam.ac.uk..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 17 Jan 2025 14:39:11 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm60v84r20059qqy0oqbmdw24</link>
  <guid>https://cl.instatus.com/incident/cm60v84r20059qqy0oqbmdw24</guid>
</item>

<item>
  <title>Migration to Forward Email, and new outbound email servers</title>
  <description>
    Type: Maintenance
    

    Affected Components: External Services
    Dec 31, 15:30:00 GMT+0 - Completed - Following a period of testing, email to the departmental domains cl.cam.ac.uk and cst.cam.ac.uk is now being routed by Forward Email. As [previously announced](https://lists.cam.ac.uk/sympa/arc/cl-department-members/2024-11/msg00013.html), most people should not notice any change, but there will be subtle differences - particularly if you have custom mail filtering rules which rely on details of the legacy UIS or department email systems. Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you notice any problems or need help to adapt your filtering rules. If you are not receiving email, contact us from an address hosted outside the department, such as your @cam.ac.uk address.

We have also replaced the mail servers used for routing outbound email from the department:

* [msa.cl.cam.ac.uk](https://www.cst.cam.ac.uk/local/sys/mail/msa)
* mail.cl.cam.ac.uk / mail-serv.cl.cam.ac.uk etc.

Again, you should not need to make any changes; your existing credentials and settings for sending email should continue to work.

These were previously tightly integrated with our legacy inbound email processing, and are now simple standalone mail servers that only handle outbound email. They in turn send mail to the internet via UIS&#039;s new outbound email service [smtp.cam.ac.uk](https://help.uis.cam.ac.uk/service/email/technical/sending). 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 31&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Following a period of testing, email to the departmental domains cl.cam.ac.uk and cst.cam.ac.uk is now being routed by Forward Email. As [previously announced](https://lists.cam.ac.uk/sympa/arc/cl-department-members/2024-11/msg00013.html), most people should not notice any change, but there will be subtle differences - particularly if you have custom mail filtering rules which rely on details of the legacy UIS or department email systems. Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you notice any problems or need help to adapt your filtering rules. If you are not receiving email, contact us from an address hosted outside the department, such as your @cam.ac.uk address.

We have also replaced the mail servers used for routing outbound email from the department:

* [msa.cl.cam.ac.uk](https://www.cst.cam.ac.uk/local/sys/mail/msa)
* mail.cl.cam.ac.uk / mail-serv.cl.cam.ac.uk etc.

Again, you should not need to make any changes; your existing credentials and settings for sending email should continue to work.

These were previously tightly integrated with our legacy inbound email processing, and are now simple standalone mail servers that only handle outbound email. They in turn send mail to the internet via UIS&#039;s new outbound email service [smtp.cam.ac.uk](https://help.uis.cam.ac.uk/service/email/technical/sending)..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 31 Dec 2024 15:30:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cm5cmqhjo004eb1q4893tn2pv</link>
  <guid>https://cl.instatus.com/maintenance/cm5cmqhjo004eb1q4893tn2pv</guid>
</item>

<item>
  <title>GPU cluster storage fault</title>
  <description>
    Type: Incident
    Duration: 9 hours and 43 minutes

    Affected Components: Other Secondary Storage Systems, GPUs
    Dec 17, 02:34:40 GMT+0 - Resolved - This incident has been resolved. Dec 16, 16:51:57 GMT+0 - Investigating - We are investigating a problem whereby dev-gpu/dev-cpu home directories are failing to mount. The likely symptom is that GPU VMs will hang during boot, but VMs that are already running will keep working. Also, access to &#039;gpuscratch&#039; paths may cause the client system to lock up.

This is due to a suspected Linux kernel bug on a storage server.

It is possible that some disruption will occur whilst we try to fix this. Dec 16, 17:15:23 GMT+0 - Investigating - This issue is now also affecting clients which already have the filesystem mounted. They may see a permission error. Most dev-gpu/dev-cpu VMs have probably frozen as they can no longer access their disks. Dec 16, 17:24:32 GMT+0 - Identified - As the GPU cluster is currently unusable anyway due to a fault with the temporary storage server, and we have a replacement storage server ready to go into service, we will take this opportunity to migrate data to the new server. This may take a few hours.

We believe that no data has been lost. The temporary storage is functioning, but the NFS service is not. Dec 16, 18:10:48 GMT+0 - Identified - Please do not attempt to start or stop any dev-gpu or dev-cpu VM at this time. It won&#039;t be successful, and might cause your VM to get into a more broken state. Dec 16, 18:37:10 GMT+0 - Identified - Access to GPU cluster home directories and scratch space has been restored using the new storage server; these are accessible **from Lab-managed Linux systems outside the GPU VM cluster** via /anfs/gpucluster/&#36;USER and /anfs/gpuscratch respectively. You can access this data via SSH to [slogin.cl.cam.ac.uk](http://slogin.cl.cam.ac.uk).

dev-gpu-acs will be available shortly, **for ACS students&#039; use only**.

GPU/CPU development VMs and the shared servers dev-gpu-1 and dev-cpu-1 remain unavailable; copying their VM disks will take longer. They should be restored to service later this evening. Dec 16, 23:27:51 GMT+0 - Identified - Personal dev-gpu / dev-cpu VMs can now be started via Xen Orchestra.

Some VMs may need some maintenance in order to start:

* VMs that were running during the incident may have unclean filesystems that need a repair. Generally you will see the boot process end with &quot;(initramfs)&quot; on the console. Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) for help.
* VMs that have not been booted for a long time may need a manual update to /etc/fstab. If your VM appears to start but you have no home directory or your home directory is read-only, either run &quot;sudo cl-update-system&quot; then reboot, or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) for help.

The shared servers dev-gpu-1 and dev-cpu-1 will be unavailable for a little while longer. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 9 hours and 43 minutes</p>
    <p><strong>Affected Components:</strong> , </p>
    &lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;02:34:40&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:51:57&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are investigating a problem whereby dev-gpu/dev-cpu home directories are failing to mount. The likely symptom is that GPU VMs will hang during boot, but VMs that are already running will keep working. Also, access to &#039;gpuscratch&#039; paths may cause the client system to lock up.

This is due to a suspected Linux kernel bug on a storage server.

It is possible that some disruption will occur whilst we try to fix this..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:15:23&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  This issue is now also affecting clients which already have the filesystem mounted. They may see a permission error. Most dev-gpu/dev-cpu VMs have probably frozen as they can no longer access their disks..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:24:32&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  As the GPU cluster is currently unusable anyway due to a fault with the temporary storage server, and we have a replacement storage server ready to go into service, we will take this opportunity to migrate data to the new server. This may take a few hours.

We believe that no data has been lost. The temporary storage is functioning, but the NFS service is not..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:10:48&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Please do not attempt to start or stop any dev-gpu or dev-cpu VM at this time. It won&#039;t be successful, and might cause your VM to get into a more broken state..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:37:10&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Access to GPU cluster home directories and scratch space has been restored using the new storage server; these are accessible **from Lab-managed Linux systems outside the GPU VM cluster** via /anfs/gpucluster/&#36;USER and /anfs/gpuscratch respectively. You can access this data via SSH to [slogin.cl.cam.ac.uk](http://slogin.cl.cam.ac.uk).

dev-gpu-acs will be available shortly, **for ACS students&#039; use only**.

GPU/CPU development VMs and the shared servers dev-gpu-1 and dev-cpu-1 remain unavailable; copying their VM disks will take longer. They should be restored to service later this evening..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Dec &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;23:27:51&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Personal dev-gpu / dev-cpu VMs can now be started via Xen Orchestra.

Some VMs may need some maintenance in order to start:

* VMs that were running during the incident may have unclean filesystems that need a repair. Generally you will see the boot process end with &quot;(initramfs)&quot; on the console. Contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) for help.
* VMs that have not been booted for a long time may need a manual update to /etc/fstab. If your VM appears to start but you have no home directory or your home directory is read-only, either run &quot;sudo cl-update-system&quot; then reboot, or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) for help.

The shared servers dev-gpu-1 and dev-cpu-1 will be unavailable for a little while longer..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 16 Dec 2024 16:51:57 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm4r9vnc5003qgrziishp9bfk</link>
  <guid>https://cl.instatus.com/incident/cm4r9vnc5003qgrziishp9bfk</guid>
</item>

<item>
  <title>Urgent storage server maintenance</title>
  <description>
    Type: Maintenance
    Duration: 1 hour and 19 minutes

    Affected Components: GPUs
    Nov 29, 15:04:43 GMT+0 - Identified - We now think that a storage server reboot **will** be required. VMs will be paused for a short while. Home directory and gpuscratch access from shared servers will be interrupted for a few minutes. Nov 29, 15:18:35 GMT+0 - Completed - This maintenance has been completed. Some dev-gpu/dev-cpu VMs may need rebooting if they lost write access to their filesystems. Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you encounter any problems. Nov 29, 14:00:00 GMT+0 - Identified - In the aftermath of the GPU cluster storage incident on 17 November, one of the servers that we are using as a temporary host for GPU cluster data needs an urgent power supply upgrade. During the data recovery we added additional SSDs to this server, which unexpectedly caused the system to report that its power usage could now theoretically exceed the capacity of its power supplies (even though in practice it does not).

We will be replacing the power supplies whilst the server is in use, as we believe this will not be disruptive, but there is a chance of a server reboot which would cause some temporary disruption to the GPU cluster. Nov 29, 14:00:01 GMT+0 - Identified - Maintenance is now in progress 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 hour and 19 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 29&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:04:43&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We now think that a storage server reboot **will** be required. VMs will be paused for a short while. Home directory and gpuscratch access from shared servers will be interrupted for a few minutes..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 29&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:18:35&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  This maintenance has been completed. Some dev-gpu/dev-cpu VMs may need rebooting if they lost write access to their filesystems. Please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you encounter any problems..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 29&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  In the aftermath of the GPU cluster storage incident on 17 November, one of the servers that we are using as a temporary host for GPU cluster data needs an urgent power supply upgrade. During the data recovery we added additional SSDs to this server, which unexpectedly caused the system to report that its power usage could now theoretically exceed the capacity of its power supplies (even though in practice it does not).

We will be replacing the power supplies whilst the server is in use, as we believe this will not be disruptive, but there is a chance of a server reboot which would cause some temporary disruption to the GPU cluster..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 29&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 29 Nov 2024 14:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/cm41mhez900gppsu73mleis9w</link>
  <guid>https://cl.instatus.com/maintenance/cm41mhez900gppsu73mleis9w</guid>
</item>

<item>
  <title>GPU cluster storage fault</title>
  <description>
    Type: Incident
    Duration: 4 days, 17 hours and 58 minutes

    Affected Components: GPUs
    Nov 17, 16:36:46 GMT+0 - Investigating - We are investigating a fault with storage on the GPU cluster, which has caused some GPU virtual machines including the shared servers dev-gpu-1 and dev-cpu-1 to fail. Nov 17, 17:05:58 GMT+0 - Identified - A filesystem (ZFS) malfunction on the GPU cluster storage server was detected last night and is being investigated. The data is believed to be almost completely intact and available but there is some minor corruption which is hopefully only affecting archived data belonging to users who have left the department. However, when investigating this issue (running &quot;zpool scrub&quot;) the server froze for a few minutes, leading to timeouts on virtual machines&#039; disks.

VMs on this cluster (mostly named dev-cpu-\* / dev-gpu-\*) will have failed. Shared VMs have been rebooted. Personal VMs will be shut down and can be restarted from Xen Orchestra.

At this stage we cannot rule out further disruption, unfortunately. Nov 17, 17:23:02 GMT+0 - Monitoring - Some VMs on the cluster may fail to start due to filesystem issues. You may see an &quot;(initramfs)&quot; prompt on the console. If your VM does not start, contact service-desk@cst.cam.ac.uk. Nov 17, 18:04:10 GMT+0 - Identified - Further urgent maintenance is needed to investigate the filesystem fault, which seems to be getting worse. Running VMs have been paused. Nov 18, 00:09:01 GMT+0 - Monitoring - Storage is available again; however the current situation is fragile:

* A very small number of files in home directories on the GPU cluster are unreadable due to corruption
* Some filesystem safety features have been disabled
* There is a chance that the server may spontaneously reboot and/or that more corruption will occur, causing further disruption
* Performance may be slightly degraded
* More disruptive maintenance will be needed soon as the current solution is temporary

If you have important data on the GPU cluster, you are reminded to take your own backups of this data on another system. Nov 18, 11:01:03 GMT+0 - Identified - The NFS service locked up this morning; we are rebooting the server.

We plan to move the GPU VM disk storage service onto disaster-recovery hardware later today, in order to try to keep VMs stable even if home directories are not. However the performance will be reduced.

We are considering options for relocation of the GPU cluster home directory service onto alternate hardware. Nov 18, 17:09:57 GMT+0 - Identified - svr-compilers0 has been moved to temporary alternate hardware and storage, so is no longer affected by this outage. The outage affects dev-gpu-\*, dev-cpu-\*, /anfs/gpucluster and /anfs/gpuscratch.

Storage is currently available but is unstable, with some files unreadable. We are continuing to work on migrating affected data (VMs first, then home directories) off the affected server and on to temporary hardware, pending arrival of a replacement storage server. Nov 18, 17:27:48 GMT+0 - Identified - It is likely that dev-gpu/cpu VMs&#039; disks will be reverted to the state they were in at **midday today**, 2024-11-18, as we have a seemingly intact snapshot from that time on a disaster recovery server. If you keep using your VM, which is **not advised**, then any changes made to its local filesystem will probably be rolled back.

VMs will be shut down this evening in order to switch to an alternate storage system. Nov 18, 22:50:52 GMT+0 - Identified - Migration of VM disks will **not** take place this evening as an ongoing copy of the recovered data to a suitable server will not finish until tomorrow at the soonest.

dev-gpu-1 and dev-cpu-1&#039;s disks (system and scratch) have been migrated to alternative storage, so these shared servers should be a bit more stable now.

**Please leave your personal VM shut down if at all possible. If you need to copy data elsewhere, use dev-cpu-1, which should have access to the same home directory in most cases.**

Home directories on the whole GPU cluster remain unstable, and may be gradually corrupting further over time. **Home directories have been made read-only** in an attempt to reduce further corruption.

At the time of writing only one file in users&#039; home directories is known to be corrupted/unreadable -- specifically, any copy of libcusparse.so.12.3.1.170 in any location. (Any copy of that file is actually a pointer to the same data on disk due to filesystem deduplication, and that data is corrupted.) We expect be able to recover this file in due course.

We are in the process of copying home directories to another temporary server so that further corruption does not happen and a stable service can be restored pending arrival of new hardware for a long-term fix. Nov 19, 14:13:07 GMT+0 - Identified - Personal dev-gpu/cpu VMs will now be shut down in order to complete transfer virtual disk storage to an alternate server. We hope that VMs will be available again within a few hours.

Home directories will remain accessible, read-only, via dev-cpu-1 and dev-gpu-1. Nov 19, 16:21:58 GMT+0 - Identified - VM disks have been moved to an alternate storage server; VMs should now be working as usual and you are welcome to start them using Xen Orchestra.

Home directories on the GPU VM cluster are still read-only, for now. These are being migrated off the failing server. Nov 20, 15:18:41 GMT+0 - Identified - /anfs/gpuscratch is now on an alternate server, and is writable again. If you are still unable to write to it, please try restarting autofs, or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) and we&#039;ll help.

We have nearly completed relocation of GPU cluster home directories (and /anfs/gpucluster) to an alternate server and will post another update later this afternoon. Nov 20, 18:01:42 GMT+0 - Identified - Migration of home directories is ongoing. This is taking longer than anticipated due to a combination of software issues on the systems used for data recovery; even though we have an almost-complete copy of the data, some time-consuming additional work is needed in order to bring that into service. Apologies for the ongoing disruption. Nov 21, 10:46:10 GMT+0 - Identified - Personal dev-gpu/cpu VMs will be paused for a few minutes in order to make some changes to their temporary storage server, needed in order to have it also serve home directories. Nov 21, 11:35:58 GMT+0 - Identified - Home directories on the GPU cluster are being switched across to an alternate server. Avoid rebooting/starting VMs, for the moment. The shared servers dev-gpu-1/dev-cpu-1 will be intermittently available whilst they are reconfigured. If you see an empty home directory, don&#039;t panic. Nov 21, 12:00:29 GMT+0 - Monitoring - Home directory storage has been migrated to a new server, and should be writable again. dev-gpu-1 and dev-cpu-1 are connected to the new storage. If your VM is currently running it may still be using the old server, i.e. you may still see a read-only home directory or other problems, but it can be connected to the new storage.

**If your VM&#039;s home directory is still read-only:** run &quot;sudo cl-update-system&quot; on your VM, wait for it to complete, then reboot the VM.

**If your VM&#039;s home directory appears empty: don&#039;t panic, your data still exists** but your VM has not mounted the right storage. Run &quot;sudo cl-update-system&quot; on your VM, wait for it to complete, then reboot the VM; if it still appears empty, email [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) \-- your VM&#039;s configuration probably just needs updating to point at the new storage, and our scripts to automatically do this for you didn&#039;t work on your VM for some reason, but we can fix it for you.

Some users are missing a file named &quot;libcusparse.so.12&quot; or &quot;libcusparse.so.12.3.1.170&quot; (generally within a Conda environment or Python virtualenv). You can simply download or recompile this file again in the same way that you originally created or generated this file, e.g. by reinstalling cusparse. We&#039;ll be restoring most of these files soon from a pristine NVIDIA/PyPi copy wherever possible. Nov 21, 13:01:38 GMT+0 - Monitoring - All copies of libcusparse, the one remaining corrupted file, have been restored from matching published versions (PyPi; NVIDIA; Conda). Therefore, we believe no data has been lost during this filesystem corruption incident. If you think you may be missing any data, or are still unable to use your VM or access your home directory, please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) as soon as possible.

Further work will be needed in due course to reinforce the resilience of the temporary storage servers, and then to migrate storage from its temporary servers to a new permanent storage server once that arrives. We will be in contact about that in due course. Nov 22, 10:34:43 GMT+0 - Resolved - We consider this incident resolved (albeit with temporary solutions); please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you are aware of any ongoing problems. Thanks again for your patience during this highly disruptive outage. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 4 days, 17 hours and 58 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:36:46&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are investigating a fault with storage on the GPU cluster, which has caused some GPU virtual machines including the shared servers dev-gpu-1 and dev-cpu-1 to fail..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:05:58&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  A filesystem (ZFS) malfunction on the GPU cluster storage server was detected last night and is being investigated. The data is believed to be almost completely intact and available but there is some minor corruption which is hopefully only affecting archived data belonging to users who have left the department. However, when investigating this issue (running &quot;zpool scrub&quot;) the server froze for a few minutes, leading to timeouts on virtual machines&#039; disks.

VMs on this cluster (mostly named dev-cpu-\* / dev-gpu-\*) will have failed. Shared VMs have been rebooted. Personal VMs will be shut down and can be restarted from Xen Orchestra.

At this stage we cannot rule out further disruption, unfortunately..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:23:02&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  Some VMs on the cluster may fail to start due to filesystem issues. You may see an &quot;(initramfs)&quot; prompt on the console. If your VM does not start, contact service-desk@cst.cam.ac.uk..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:04:10&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Further urgent maintenance is needed to investigate the filesystem fault, which seems to be getting worse. Running VMs have been paused..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;00:09:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  Storage is available again; however the current situation is fragile:

* A very small number of files in home directories on the GPU cluster are unreadable due to corruption
* Some filesystem safety features have been disabled
* There is a chance that the server may spontaneously reboot and/or that more corruption will occur, causing further disruption
* Performance may be slightly degraded
* More disruptive maintenance will be needed soon as the current solution is temporary

If you have important data on the GPU cluster, you are reminded to take your own backups of this data on another system..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:01:03&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The NFS service locked up this morning; we are rebooting the server.

We plan to move the GPU VM disk storage service onto disaster-recovery hardware later today, in order to try to keep VMs stable even if home directories are not. However the performance will be reduced.

We are considering options for relocation of the GPU cluster home directory service onto alternate hardware..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:09:57&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  svr-compilers0 has been moved to temporary alternate hardware and storage, so is no longer affected by this outage. The outage affects dev-gpu-\*, dev-cpu-\*, /anfs/gpucluster and /anfs/gpuscratch.

Storage is currently available but is unstable, with some files unreadable. We are continuing to work on migrating affected data (VMs first, then home directories) off the affected server and on to temporary hardware, pending arrival of a replacement storage server..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:27:48&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  It is likely that dev-gpu/cpu VMs&#039; disks will be reverted to the state they were in at **midday today**, 2024-11-18, as we have a seemingly intact snapshot from that time on a disaster recovery server. If you keep using your VM, which is **not advised**, then any changes made to its local filesystem will probably be rolled back.

VMs will be shut down this evening in order to switch to an alternate storage system..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 18&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;22:50:52&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Migration of VM disks will **not** take place this evening as an ongoing copy of the recovered data to a suitable server will not finish until tomorrow at the soonest.

dev-gpu-1 and dev-cpu-1&#039;s disks (system and scratch) have been migrated to alternative storage, so these shared servers should be a bit more stable now.

**Please leave your personal VM shut down if at all possible. If you need to copy data elsewhere, use dev-cpu-1, which should have access to the same home directory in most cases.**

Home directories on the whole GPU cluster remain unstable, and may be gradually corrupting further over time. **Home directories have been made read-only** in an attempt to reduce further corruption.

At the time of writing only one file in users&#039; home directories is known to be corrupted/unreadable -- specifically, any copy of libcusparse.so.12.3.1.170 in any location. (Any copy of that file is actually a pointer to the same data on disk due to filesystem deduplication, and that data is corrupted.) We expect be able to recover this file in due course.

We are in the process of copying home directories to another temporary server so that further corruption does not happen and a stable service can be restored pending arrival of new hardware for a long-term fix..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:13:07&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Personal dev-gpu/cpu VMs will now be shut down in order to complete transfer virtual disk storage to an alternate server. We hope that VMs will be available again within a few hours.

Home directories will remain accessible, read-only, via dev-cpu-1 and dev-gpu-1..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:21:58&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  VM disks have been moved to an alternate storage server; VMs should now be working as usual and you are welcome to start them using Xen Orchestra.

Home directories on the GPU VM cluster are still read-only, for now. These are being migrated off the failing server..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:18:41&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  /anfs/gpuscratch is now on an alternate server, and is writable again. If you are still unable to write to it, please try restarting autofs, or contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) and we&#039;ll help.

We have nearly completed relocation of GPU cluster home directories (and /anfs/gpucluster) to an alternate server and will post another update later this afternoon..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:01:42&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Migration of home directories is ongoing. This is taking longer than anticipated due to a combination of software issues on the systems used for data recovery; even though we have an almost-complete copy of the data, some time-consuming additional work is needed in order to bring that into service. Apologies for the ongoing disruption..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 21&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:46:10&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Personal dev-gpu/cpu VMs will be paused for a few minutes in order to make some changes to their temporary storage server, needed in order to have it also serve home directories..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 21&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:35:58&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Home directories on the GPU cluster are being switched across to an alternate server. Avoid rebooting/starting VMs, for the moment. The shared servers dev-gpu-1/dev-cpu-1 will be intermittently available whilst they are reconfigured. If you see an empty home directory, don&#039;t panic..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 21&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:00:29&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  Home directory storage has been migrated to a new server, and should be writable again. dev-gpu-1 and dev-cpu-1 are connected to the new storage. If your VM is currently running it may still be using the old server, i.e. you may still see a read-only home directory or other problems, but it can be connected to the new storage.

**If your VM&#039;s home directory is still read-only:** run &quot;sudo cl-update-system&quot; on your VM, wait for it to complete, then reboot the VM.

**If your VM&#039;s home directory appears empty: don&#039;t panic, your data still exists** but your VM has not mounted the right storage. Run &quot;sudo cl-update-system&quot; on your VM, wait for it to complete, then reboot the VM; if it still appears empty, email [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) \-- your VM&#039;s configuration probably just needs updating to point at the new storage, and our scripts to automatically do this for you didn&#039;t work on your VM for some reason, but we can fix it for you.

Some users are missing a file named &quot;libcusparse.so.12&quot; or &quot;libcusparse.so.12.3.1.170&quot; (generally within a Conda environment or Python virtualenv). You can simply download or recompile this file again in the same way that you originally created or generated this file, e.g. by reinstalling cusparse. We&#039;ll be restoring most of these files soon from a pristine NVIDIA/PyPi copy wherever possible..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 21&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:01:38&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  All copies of libcusparse, the one remaining corrupted file, have been restored from matching published versions (PyPi; NVIDIA; Conda). Therefore, we believe no data has been lost during this filesystem corruption incident. If you think you may be missing any data, or are still unable to use your VM or access your home directory, please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) as soon as possible.

Further work will be needed in due course to reinforce the resilience of the temporary storage servers, and then to migrate storage from its temporary servers to a new permanent storage server once that arrives. We will be in contact about that in due course..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 22&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;10:34:43&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  We consider this incident resolved (albeit with temporary solutions); please contact [service-desk@cst.cam.ac.uk](mailto:service-desk@cst.cam.ac.uk) if you are aware of any ongoing problems. Thanks again for your patience during this highly disruptive outage..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sun, 17 Nov 2024 16:36:46 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm3ltkdhf000brp5ct46e5qoo</link>
  <guid>https://cl.instatus.com/incident/cm3ltkdhf000brp5ct46e5qoo</guid>
</item>

<item>
  <title>GPU cluster storage maintenance</title>
  <description>
    Type: Incident
    Duration: 30 minutes

    Affected Components: GPUs
    Nov 10, 20:55:20 GMT+0 - Identified - Some urgent maintenance is needed on the storage server that holds home directories on the GPU cluster (dev-gpu-\*, dev-cpu-\*). VMs will be paused for a short while whilst home directories are unavailable. Nov 10, 21:25:35 GMT+0 - Resolved - This work has been completed. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 30 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 10&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;20:55:20&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Some urgent maintenance is needed on the storage server that holds home directories on the GPU cluster (dev-gpu-\*, dev-cpu-\*). VMs will be paused for a short while whilst home directories are unavailable..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 10&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;21:25:35&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This work has been completed..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sun, 10 Nov 2024 20:55:20 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm3c2pxsa0047pyonxbwmd21v</link>
  <guid>https://cl.instatus.com/incident/cm3c2pxsa0047pyonxbwmd21v</guid>
</item>

<item>
  <title>Network instability</title>
  <description>
    Type: Incident
    Duration: 2 hours and 38 minutes

    Affected Components: Network
    Nov 6, 14:12:09 GMT+0 - Investigating - We are currently investigating some instability on our connection to the University network. Nov 6, 16:50:32 GMT+0 - Resolved - This incident has been resolved. Nov 6, 15:00:58 GMT+0 - Monitoring - We implemented a possible workaround and are monitoring the impact. Further disruption may occur. Nov 6, 16:11:01 GMT+0 - Monitoring - The workaround has been removed with a longer-term fix now in place, and we are continuing to monitor. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 2 hours and 38 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:12:09&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We are currently investigating some instability on our connection to the University network..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:50:32&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:00:58&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  We implemented a possible workaround and are monitoring the impact. Further disruption may occur..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Nov &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:11:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  The workaround has been removed with a longer-term fix now in place, and we are continuing to monitor..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Wed, 6 Nov 2024 14:12:09 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm35yk17j000u41hh6wb7ojfq</link>
  <guid>https://cl.instatus.com/incident/cm35yk17j000u41hh6wb7ojfq</guid>
</item>

<item>
  <title>wgb guest wifi DHCP fault</title>
  <description>
    Type: Incident
    Duration: 4 hours and 25 minutes

    Affected Components: Network
    Oct 9, 11:26:11 GMT+0 - Investigating - We have received reports that the &quot;wgb&quot; guest wireless network is not currently working. We are investigating a DHCP issue.

Members of any University can and should use &quot;eduroam&quot; instead which is unaffected. Members of the department should use &quot;Internal-CL&quot;. Guests can use the central University guest wifi service &quot;UniOfCam-Guest&quot;. These are all available in the same places that would normally have &quot;wgb&quot; coverage. Oct 9, 12:34:34 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Oct 9, 15:50:46 GMT+0 - Resolved - This incident has been resolved. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 4 hours and 25 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:26:11&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Investigating&lt;/strong&gt; -
  We have received reports that the &quot;wgb&quot; guest wireless network is not currently working. We are investigating a DHCP issue.

Members of any University can and should use &quot;eduroam&quot; instead which is unaffected. Members of the department should use &quot;Internal-CL&quot;. Guests can use the central University guest wifi service &quot;UniOfCam-Guest&quot;. These are all available in the same places that would normally have &quot;wgb&quot; coverage..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:34:34&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; -
  We implemented a fix and are currently monitoring the result..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Oct &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:50:46&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  This incident has been resolved..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Wed, 9 Oct 2024 11:26:11 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm21sar1l000o10qxvvh7tbh1</link>
  <guid>https://cl.instatus.com/incident/cm21sar1l000o10qxvvh7tbh1</guid>
</item>

<item>
  <title>Filer NFS reconnections required</title>
  <description>
    Type: Incident
    Duration: 34 minutes

    Affected Components: Filer
    Aug 20, 16:50:19 GMT+0 - Identified - Due to an unintended consequence of a minor configuration change, NFS connections to filer have been disrupted on some systems. The likely symptom is an &quot;Invalid argument&quot; error when trying to access any directory on filer. A reboot should solve the problem, or if you have root access to an affected client, &quot;sudo systemctl restart autofs&quot; may also solve the problem. Aug 20, 17:24:12 GMT+0 - Resolved - There is no ongoing problem with filer, but a change at approximately 17:35 caused some clients to lose existing connections to filer as a one-off. Filer access has been fixed on the main departmental servers (slogin, web, mail, etc.). If you encounter problems accessing file on another system and cannot fix it yourself, please contact service-desk@cl.cam.ac.uk. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 34 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:50:19&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Due to an unintended consequence of a minor configuration change, NFS connections to filer have been disrupted on some systems. The likely symptom is an &quot;Invalid argument&quot; error when trying to access any directory on filer. A reboot should solve the problem, or if you have root access to an affected client, &quot;sudo systemctl restart autofs&quot; may also solve the problem..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:24:12&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Resolved&lt;/strong&gt; -
  There is no ongoing problem with filer, but a change at approximately 17:35 caused some clients to lose existing connections to filer as a one-off. Filer access has been fixed on the main departmental servers (slogin, web, mail, etc.). If you encounter problems accessing file on another system and cannot fix it yourself, please contact service-desk@cl.cam.ac.uk..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 20 Aug 2024 16:50:19 +0000</pubDate>
  <link>https://cl.instatus.com/incident/cm02nuzoq002vrk7lcukzl3i2</link>
  <guid>https://cl.instatus.com/incident/cm02nuzoq002vrk7lcukzl3i2</guid>
</item>

<item>
  <title>William Gates Building planned power outage</title>
  <description>
    Type: Maintenance
    Duration: 6 hours and 35 minutes

    Affected Components: Other Secondary Storage Systems, GPUs, Secondary VM Hosts, Caelum Console (server management), Other Internal Services, GN09
    Aug 17, 07:00:00 GMT+0 - Identified - The William Gates Building will be without power for part of Saturday 17th August 2024, due to further planned work on our electrical switch gear on the connection to the building&#039;s new solar panels. This additional shutdown is needed to rectify a problem with one of the components installed during the January shutdown.

**Nearly all IT services in the William Gates Building will be unavailable for most of the day.**

Telephones, office networking and wifi will be unavailable all day (but the building is likely to be closed in any case). Please make sure that all computers in offices are shut down - not just asleep - when you leave on Friday.

We will start shutting down servers at 8am ready for the power to be turned off at around 10am. We expect the power to come back on at approximately 1pm but it will then take some time to bring all systems back into operation.

We will unfortunately need to shut down all servers in GN09 except for a very small number of critical services such as filer and network infrastructure (which will be powered from a temporary generator), as the cooling system will be offline for several hours and temperatures would otherwise climb to unsafe levels. **This includes nearly all research servers and all GPU servers (including GPU VMs).** GN09 holds almost all of our server hardware; if you are unsure where your server is located, it is probably in GN09 and will probably be affected. (A very small number of research systems are in the West Cambridge Data Centre, and will not be affected.)

**The outage is not expected to affect core infrastructure, administrative systems or small VMs** as these are hosted in the West Cambridge Data Centre. However there is a risk that access to filer from these systems will be disrupted; we don&#039;t plan to turn filer off, but it is in GN09, its temporary electrical supply is at risk, and we may have to turn it off if it gets too hot. Where a service is replicated between multiple sites, only one instance of the service may be available (this affects most core services such as LDAP, Active Directory and VPN2).

VMs hosted by the department will stay running unless they are on the GPU VM clusters (this applies both to VMs with GPUs, and VMs with a lot of CPU cores - generally with names that contain &quot;gpu&quot;, &quot;cpu&quot; or &quot;dev&quot;).

Services hosted externally to the department, for example by UIS, will not be affected - for example Moodle, CamSIS, HPC, Exchange email, Fastmail email and the main departmental (CST) website. Aug 17, 13:34:54 GMT+0 - Completed - All infrastructure is believed to be operational again after this morning&#039;s electrical work, with the exception of power control for a small number of research servers in GN09 rack 6; a PDU has developed a hardware fault. Power is still being supplied, but cannot be turned off or on remotely. Servers can still be turned off or on via their BMCs, but if you need a server power-cycling please contact service-desk@cl.cam.ac.uk. Aug 17, 07:00:01 GMT+0 - Identified - Maintenance is now in progress Aug 19, 16:50:27 GMT+0 - Completed - The issue affecting power control of servers in GN09 rack 6 has been rectified. Aug 17, 11:54:37 GMT+0 - Identified - The electrical supply has been restored. We are in the process of restoring infrastructure and then GN09 servers. This is likely to take an hour or two. Aug 16, 11:09:32 GMT+0 - Identified - Reminder that the William Gates Building&#039;s electrical supply will be turned off tomorrow.

Please fully shut down office PCs before you leave today.

Research and teaching servers in GN09 will be turned off tomorrow morning, except where already agreed. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 6 hours and 35 minutes</p>
    <p><strong>Affected Components:</strong> , , , , , </p>
    &lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;07:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The William Gates Building will be without power for part of Saturday 17th August 2024, due to further planned work on our electrical switch gear on the connection to the building&#039;s new solar panels. This additional shutdown is needed to rectify a problem with one of the components installed during the January shutdown.

**Nearly all IT services in the William Gates Building will be unavailable for most of the day.**

Telephones, office networking and wifi will be unavailable all day (but the building is likely to be closed in any case). Please make sure that all computers in offices are shut down - not just asleep - when you leave on Friday.

We will start shutting down servers at 8am ready for the power to be turned off at around 10am. We expect the power to come back on at approximately 1pm but it will then take some time to bring all systems back into operation.

We will unfortunately need to shut down all servers in GN09 except for a very small number of critical services such as filer and network infrastructure (which will be powered from a temporary generator), as the cooling system will be offline for several hours and temperatures would otherwise climb to unsafe levels. **This includes nearly all research servers and all GPU servers (including GPU VMs).** GN09 holds almost all of our server hardware; if you are unsure where your server is located, it is probably in GN09 and will probably be affected. (A very small number of research systems are in the West Cambridge Data Centre, and will not be affected.)

**The outage is not expected to affect core infrastructure, administrative systems or small VMs** as these are hosted in the West Cambridge Data Centre. However there is a risk that access to filer from these systems will be disrupted; we don&#039;t plan to turn filer off, but it is in GN09, its temporary electrical supply is at risk, and we may have to turn it off if it gets too hot. Where a service is replicated between multiple sites, only one instance of the service may be available (this affects most core services such as LDAP, Active Directory and VPN2).

VMs hosted by the department will stay running unless they are on the GPU VM clusters (this applies both to VMs with GPUs, and VMs with a lot of CPU cores - generally with names that contain &quot;gpu&quot;, &quot;cpu&quot; or &quot;dev&quot;).

Services hosted externally to the department, for example by UIS, will not be affected - for example Moodle, CamSIS, HPC, Exchange email, Fastmail email and the main departmental (CST) website..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:34:54&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  All infrastructure is believed to be operational again after this morning&#039;s electrical work, with the exception of power control for a small number of research servers in GN09 rack 6; a PDU has developed a hardware fault. Power is still being supplied, but cannot be turned off or on remotely. Servers can still be turned off or on via their BMCs, but if you need a server power-cycling please contact service-desk@cl.cam.ac.uk..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;07:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:50:27&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  The issue affecting power control of servers in GN09 rack 6 has been rectified..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:54:37&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The electrical supply has been restored. We are in the process of restoring infrastructure and then GN09 servers. This is likely to take an hour or two..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:09:32&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Reminder that the William Gates Building&#039;s electrical supply will be turned off tomorrow.

Please fully shut down office PCs before you leave today.

Research and teaching servers in GN09 will be turned off tomorrow morning, except where already agreed..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sat, 17 Aug 2024 07:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clypr9dv5308295gpn1itod19dz</link>
  <guid>https://cl.instatus.com/maintenance/clypr9dv5308295gpn1itod19dz</guid>
</item>

<item>
  <title>Database server maintenance</title>
  <description>
    Type: Maintenance
    Duration: 1 hour and 15 minutes

    Affected Components: Other Internal Services
    Aug 14, 17:31:50 GMT+0 - Identified - We are applying urgent security updates to the departmental database server and dbwebserver. Aug 14, 18:47:13 GMT+0 - Completed - The user-visible impact of this maintenance has completed; some behind-the-scenes servers continue to be updated but this should not impact any use of the database. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 hour and 15 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 14&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:31:50&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We are applying urgent security updates to the departmental database server and dbwebserver..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Aug &lt;var data-var=&#039;date&#039;&gt; 14&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:47:13&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  The user-visible impact of this maintenance has completed; some behind-the-scenes servers continue to be updated but this should not impact any use of the database..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Wed, 14 Aug 2024 17:31:50 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clzu4qic0360242hboc1jaq61ym</link>
  <guid>https://cl.instatus.com/maintenance/clzu4qic0360242hboc1jaq61ym</guid>
</item>

<item>
  <title>UIS firewall maintenance</title>
  <description>
    Type: Maintenance
    Duration: 2 hours and 30 minutes

    Affected Components: Other Internal Services
    Jul 29, 05:00:01 GMT+0 - Identified - Maintenance is now in progress Jul 29, 07:30:00 GMT+0 - Completed - Maintenance has completed successfully Jul 29, 05:00:00 GMT+0 - Identified - UIS will be carrying out network maintenance on Monday 29 July from 6am to 8:30am (to physically reconnect a data centre firewall to a new network).

The central IT services listed below will be unavailable for 10–30 minutes during this period:

* CamSIS
* CHRIS
* CUFS
* Research dashboard
* X5
* University DNS Service

We recommend waiting until after 08:30 before logging in to the services listed above. They may come back online earlier than 8:30am, so you can try to log in if you have urgent work, but please be aware that you may experience connectivity issues.

If you experience issues accessing the services after the maintenance period, try logging out and back in. If problems persist, please [contact the UIS Service Desk](https://help.uis.cam.ac.uk/contact-us). 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 2 hours and 30 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 29&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;05:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 29&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;07:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 29&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;05:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  UIS will be carrying out network maintenance on Monday 29 July from 6am to 8:30am (to physically reconnect a data centre firewall to a new network).

The central IT services listed below will be unavailable for 10–30 minutes during this period:

* CamSIS
* CHRIS
* CUFS
* Research dashboard
* X5
* University DNS Service

We recommend waiting until after 08:30 before logging in to the services listed above. They may come back online earlier than 8:30am, so you can try to log in if you have urgent work, but please be aware that you may experience connectivity issues.

If you experience issues accessing the services after the maintenance period, try logging out and back in. If problems persist, please [contact the UIS Service Desk](https://help.uis.cam.ac.uk/contact-us)..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 29 Jul 2024 05:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clypsyf9b339463gwn1zf5d6gbs</link>
  <guid>https://cl.instatus.com/maintenance/clypsyf9b339463gwn1zf5d6gbs</guid>
</item>

<item>
  <title>GPU cluster storage maintenance</title>
  <description>
    Type: Maintenance
    Duration: 2 hours and 2 minutes

    Affected Components: GPUs
    Jul 16, 16:54:50 GMT+0 - Identified - This work is ongoing and is likely to overrun due to an unexpected hardware problem. Jul 16, 11:00:58 GMT+0 - Identified - Reminder: this maintenance is taking place at 17:00 today and will require all dev-gpu-\* and dev-cpu-\* VMs to be shut down. Jul 16, 17:36:15 GMT+0 - Identified - The outage has overrun due to a problem encountered during the storage server&#039;s RAM upgrade. Progress is being made; we can still upgrade the RAM and restore service, just not in the way we expected to. Current estimate for restoration of service: 19:00-19:15. Jul 16, 16:00:01 GMT+0 - Identified - Maintenance is now in progress Jul 16, 18:02:10 GMT+0 - Completed - This maintenance has been completed. Personal VMs can be started via Xen Orchestra (&lt;https://xo.cl.cam.ac.uk/&gt;) as needed. Jul 16, 16:00:00 GMT+0 - Identified - The server that hosts storage for the departmental GPU cluster needs an urgent security update, and a reboot.

This will necessitate **shutting down all GPU and CPU development VMs**, dev-gpu-\* and dev-cpu-1, including the shared servers dev-gpu-1 and dev-cpu-1\. These VMs&#039; disks, as well as associated data directories (GPU home directories and shared &quot;gpuscratch&quot; space), will be unavailable for about half an hour. As it will take time to shut down and restart the VM infrastructure, VMs will be unavailable for longer: approximately an hour.

We will take the opportunity to add RAM to the storage server too, to improve performance. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 2 hours and 2 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:54:50&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  This work is ongoing and is likely to overrun due to an unexpected hardware problem..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;11:00:58&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Reminder: this maintenance is taking place at 17:00 today and will require all dev-gpu-\* and dev-cpu-\* VMs to be shut down..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:36:15&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The outage has overrun due to a problem encountered during the storage server&#039;s RAM upgrade. Progress is being made; we can still upgrade the RAM and restore service, just not in the way we expected to. Current estimate for restoration of service: 19:00-19:15..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:02:10&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  This maintenance has been completed. Personal VMs can be started via Xen Orchestra (&lt;https://xo.cl.cam.ac.uk/&gt;) as needed..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 16&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The server that hosts storage for the departmental GPU cluster needs an urgent security update, and a reboot.

This will necessitate **shutting down all GPU and CPU development VMs**, dev-gpu-\* and dev-cpu-1, including the shared servers dev-gpu-1 and dev-cpu-1\. These VMs&#039; disks, as well as associated data directories (GPU home directories and shared &quot;gpuscratch&quot; space), will be unavailable for about half an hour. As it will take time to shut down and restart the VM infrastructure, VMs will be unavailable for longer: approximately an hour.

We will take the opportunity to add RAM to the storage server too, to improve performance..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 16 Jul 2024 16:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clyhbk6h540139h4ofuxre5d0a</link>
  <guid>https://cl.instatus.com/maintenance/clyhbk6h540139h4ofuxre5d0a</guid>
</item>

<item>
  <title>archive-smb maintenance</title>
  <description>
    Type: Maintenance
    Duration: 23 minutes

    Affected Components: Archive Server
    Jul 9, 21:23:21 GMT+0 - Completed - Maintenance has completed successfully. Jul 9, 21:00:00 GMT+0 - Identified - As mentioned in last week&#039;s incident, we were awaiting availability of an urgent software update to the software on the new archive server. That update is now available, and will be installed on archive-smb this evening. Expect approximately 15 minutes&#039; outage to \\\\archive-smb.cl.cam.ac.uk (as well as scy27\_traces and corelab\_datasets).

This update is necessary to ensure the security of the new archive server. Jul 9, 21:00:01 GMT+0 - Identified - Maintenance is now in progress 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 23 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;21:23:21&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;21:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  As mentioned in last week&#039;s incident, we were awaiting availability of an urgent software update to the software on the new archive server. That update is now available, and will be installed on archive-smb this evening. Expect approximately 15 minutes&#039; outage to \\\\archive-smb.cl.cam.ac.uk (as well as scy27\_traces and corelab\_datasets).

This update is necessary to ensure the security of the new archive server..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 9&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;21:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Tue, 9 Jul 2024 21:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clyeryttp54783ihn1l6gikq5r</link>
  <guid>https://cl.instatus.com/maintenance/clyeryttp54783ihn1l6gikq5r</guid>
</item>

<item>
  <title>Archive SMB service migration</title>
  <description>
    Type: Maintenance
    Duration: 2 days, 5 hours and 31 minutes

    Affected Components: Archive Server
    Jul 4, 09:00:01 GMT+0 - Identified - Maintenance is now in progress Jul 4, 20:54:27 GMT+0 - Identified - All affected shares / volumes should now be available again.

As previously communicated, the server name for SMB volumes has changed. Where you previously used a path starting with \\\\archive.cl.cam.ac.uk, you will now need to use **\\\\archive-smb.cl.cam.ac.uk**.

Also as previously communicated, ownership of files and directories has been reset. Each share has been assigned an owning user or group, based on who we believe is using the share. The owning user or group has full control over the contents of the share; anyone not in the group has read-only access. Detailed permissions (ACLs) and other extended attributes have been removed. If you find that the new permissions are not behaving as you expect, please contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk).

You are welcome to add ACLs to your shares again if you wish. They will be stored in a new format behind the scenes, but should work the same as they always did.

Some further work is ongoing, for example to restore the use of snapshots.

Please note that we anticipate having to install a software update on the new archive server within the next few days (unfortunately it was not available in time to install today), which will cause another brief outage.

**Users of corelab\_datasets** can now access their volume from Linux via NFS through the path **/auto/archive/corelab\_datasets**. If it doesn&#039;t work, you may need to restart autofs once on your client: if you have root access, use &quot;sudo systemctl restart autofs&quot;. Please discontinue the use of the old SMB-based path (/smb/...). In due course we will tidy up the SMB-based setup from those clients on which it has been set up. Thanks for your patience whilst we worked to improve the use of this volume for Linux users. Windows and Mac users can continue to use SMB via the new path **\\\\archive-smb.cl.cam.ac.uk\\corelab\_datasets**.

**Users of scy27\_traces** can continue to use **/auto/archive/scy27\_traces** but will now need a Kerberos ticket to do so. If it doesn&#039;t work, you may need to restart autofs once on your client: if you have root access, use &quot;sudo systemctl restart autofs&quot;. You can also access the volume via SMB if needed. Jul 6, 14:31:20 GMT+0 - Completed - Maintenance has completed successfully. Jul 4, 09:00:00 GMT+0 - Identified - The SMB (Windows-style) storage service on [archive.cl.cam.ac.uk](http://archive.cl.cam.ac.uk) will be moving to a new server and a new hostname. Users will need to change how they access the service. NFS users will be briefly impacted too at this time (and then the NFS service will move to the new server later). Details will be sent out by email and posted here in due course.

The work will begin on 4th July, with follow-up work possibly extending into 5th July. Times are approximate at this stage but you should consider the service to be unavailable all day. Jul 4, 12:07:08 GMT+0 - Identified - The part of today&#039;s work affecting NFS users of archive (other than scy27\_traces which is a special case) is now complete. NFS volumes are available again. If you have problems with the archive NFS service, contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk).

SMB / Windows-style volumes are currently unavailable and will be set up on the new archive server (archive-smb) during the rest of today and tomorrow. Jun 17, 18:00:55 GMT+0 - Identified - _Text of the email announcement:_

This message is meant for researchers who store data on the **archive server, archive.cl.cam.ac.uk**. Professional services staff can ignore this announcement. Researchers who aren&#039;t sure whether they use archive can see below for some guidance on how to tell.

**The archive server, archive.cl.cam.ac.uk, is being replaced.** The server will be unavailable on one or two occasions (first one: **4th-5th July**) whilst this work takes place. **You may then need to change how you access the server.**

### **How do I know if I am using archive?**

Most people are **not** using archive. However if you contacted us asking for backup space or more than a few hundred gigabytes of storage at any point during the last several years, we might have provided you with storage on archive. We would generally have discussed this with you when creating the storage, though in some cases that would have been many years ago.

The archive server is **not** related to filer or bigdisc.

Generally any access to archive will be through the hostname **“archive.cl.cam.ac.uk” or simply “archive”**, or paths including the component **&quot;archive&quot; or &quot;gfxdisp&quot;**. (The server name &quot;berilia&quot; is also involved; if you are using anything referring to &quot;berilia&quot; please contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) as this will stop working.)

Archive provides two separate services: NFS (Linux/UNIX style) shares and SMB (Windows-style) shares. (SMB may also be referred to as CIFS. SMB can be accessed from Linux clients too, though mostly it&#039;s intended for Windows.) At the moment, both services use the same server name, archive.cl.cam.ac.uk. We will be migrating both services to a new server, but on different days. However, **all** users of archive will be impacted to some extent on that day, and **almost all** users of archive will need to change how they use the service.

**You are using archive’s SMB (Windows-style) service, a.k.a. CIFS, if any of the following applies:**

* You are using path that look similar to any of the following:  
   * **\\\\archive\\_something_**or **\\\\archive.cl.cam.ac.uk\\_something_**(Windows UNC paths)  
   * **smb://archive.cl.cam.ac.uk/_something_**or **smb://archive/_something_** (Linux/Mac file browser SMB paths)  
   * **//archive.cl.cam.ac.uk/_something_**or **//archive/_something_**(UNC paths as used by some Linux clients such as “mount -t smbfs” or “mount -t cifs”)
* You are using “**corelab\_datasets**” (this is a special case; see below)
* You are using “**scy27\_traces**” (this is a special case; see below)

If any of these apply, see &quot;Impact on SMB users&quot; below; expect a long period of disruption on 4th July, perhaps extending into 5th July, and to have to change how you access archive.

**You are using archive’s NFS (Linux/UNIX-style) service if any of the following applies:**

* You are manually mounting storage from **archive.cl.cam.ac.uk:/export/_something_ or archive:/export/_something_**, or have one of those paths in **/etc/fstab**
* You are using paths under **/auto/archive or /net/archive or /anfs/gfxdisp or /auto/anfs/gfxdisp**

If any of these apply, see &quot;Impact on NFS users&quot; below; expect a short disruption on 4th July, then another email about a longer disruption a month or two later.

### **Impact on SMB users** 

The SMB service will move first, on **4th July**. It will be unavailable for several hours on that day, potentially with disruption extending to the next day due to the large amount of work needed to reinstate this service on a new server.

**Immediately after the work on 4th July, the archive SMB service will move to archive-smb.cl.cam.ac.uk.** You will need to update any path that you use that currently refers to the SMB service on archive.cl.cam.ac.uk.

**Access permissions on SMB shares will be reset!** For each SMB share, we have identified the current users and will grant those users read and write access to the entire share. Every other member of the department will be given read access (in order to minimise disruption, as we believe there is no confidential data on archive). **If you have confidential data on archive that must not be readable by other users, please contact** [**sys-admin@cl.cam.ac.uk**](mailto:sys-admin@cl.cam.ac.uk) **immediately.**

Any custom permissions that may have been set on individual files and folders will be lost. Unfortunately it is not feasible to transfer permissions from the old server to the new one, as we are switching software platform (from Spectra Verde to TrueNAS) and the two platforms store Windows-style ACLs in different, incompatible ways.

Whilst the permissions are being rebuilt on 4th-5th July – which is a manual process – you may find that you are unable to access your storage. We will try to restore read access first; write access may take longer.

**If you are using corelab\_datasets**: This is a special case; it is a SMB (Windows) share, but several members of your group access it from managed Linux systems via a temporary mechanism. We plan to make this share available via both SMB and NFS, to better support access from both operating systems. We will reconfigure each managed Linux machine on which we’ve set up access to this dataset to use NFS, once this volume is available via NFS. Until then, you will be unable to access corelab\_datasets from Linux.

**If you are using scy27\_traces**: As discussed with the users (RT ticket #135996), this is currently a NFS share but is being migrated along with the SMB shares so that we can enable SMB access in future. So this will experience the same disruption as SMB shares, described above, including the reset of permissions. On 4th July it will also start to require a Kerberos ticket for NFS access.

### **Impact on NFS users**

Firstly, on **4th July** the NFS service will be unavailable for a short while (approximately 30-60 minutes) as the whole archive server will need to be shut down in order to move the disks holding SMB data to another server. Preparatory work has already been completed to try to expedite this process; however there is a chance of unexpected problems with bringing the NFS service back online on the old server (due to complications with the old server&#039;s obsolete software platform). We will prioritise bringing the NFS service back up as quickly as possible on 4th July.

There will be a future more-disruptive outage to archive’s NFS service in a month or two when we move that service to the new hardware; the timeline for this is not yet decided.

But if you are accessing an archive NFS filesystem from a Lab-managed Linux/UNIX system, you can **prepare now** for the migration by ensuring that you are **not using a path starting /net/archive**. If you are using /net/archive/export/_SHARE_ please switch to **/auto/archive/_SHARE_** as soon as convenient.

If you are accessing an archive NFS filesystem from a non-Lab-managed computer by manually mounting archive\[.cl.cam.ac.uk\]:/export/SHARE, we suggest that you email [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) now to ask that we help you to set up the Lab automounter on your computer. If you don’t, you’ll have to change to a different path when we move the NFS service.

At the moment, NFS shares on archive only support “sec=sys” (IP-address-based access); they do not use any form of strong authentication. After the upgrade, we will be in a position to optionally authenticate NFS using Kerberos, as we do on filer. Contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) if you are interested in switching your archive NFS share to Kerberos.

### **Updates during the work**

Updates on progress during the work (as well as a copy of this announcement) will be posted at &lt;https://cl.instatus.com/clxj8zjas110017bkln1e6boqwu&gt; . You can subscribe on that page if you would like to receive updates by email.

Please contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) if you have any questions. Thanks. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 2 days, 5 hours and 31 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 4&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;09:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 4&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;20:54:27&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  All affected shares / volumes should now be available again.

As previously communicated, the server name for SMB volumes has changed. Where you previously used a path starting with \\\\archive.cl.cam.ac.uk, you will now need to use **\\\\archive-smb.cl.cam.ac.uk**.

Also as previously communicated, ownership of files and directories has been reset. Each share has been assigned an owning user or group, based on who we believe is using the share. The owning user or group has full control over the contents of the share; anyone not in the group has read-only access. Detailed permissions (ACLs) and other extended attributes have been removed. If you find that the new permissions are not behaving as you expect, please contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk).

You are welcome to add ACLs to your shares again if you wish. They will be stored in a new format behind the scenes, but should work the same as they always did.

Some further work is ongoing, for example to restore the use of snapshots.

Please note that we anticipate having to install a software update on the new archive server within the next few days (unfortunately it was not available in time to install today), which will cause another brief outage.

**Users of corelab\_datasets** can now access their volume from Linux via NFS through the path **/auto/archive/corelab\_datasets**. If it doesn&#039;t work, you may need to restart autofs once on your client: if you have root access, use &quot;sudo systemctl restart autofs&quot;. Please discontinue the use of the old SMB-based path (/smb/...). In due course we will tidy up the SMB-based setup from those clients on which it has been set up. Thanks for your patience whilst we worked to improve the use of this volume for Linux users. Windows and Mac users can continue to use SMB via the new path **\\\\archive-smb.cl.cam.ac.uk\\corelab\_datasets**.

**Users of scy27\_traces** can continue to use **/auto/archive/scy27\_traces** but will now need a Kerberos ticket to do so. If it doesn&#039;t work, you may need to restart autofs once on your client: if you have root access, use &quot;sudo systemctl restart autofs&quot;. You can also access the volume via SMB if needed..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;14:31:20&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 4&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;09:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The SMB (Windows-style) storage service on [archive.cl.cam.ac.uk](http://archive.cl.cam.ac.uk) will be moving to a new server and a new hostname. Users will need to change how they access the service. NFS users will be briefly impacted too at this time (and then the NFS service will move to the new server later). Details will be sent out by email and posted here in due course.

The work will begin on 4th July, with follow-up work possibly extending into 5th July. Times are approximate at this stage but you should consider the service to be unavailable all day..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jul &lt;var data-var=&#039;date&#039;&gt; 4&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;12:07:08&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The part of today&#039;s work affecting NFS users of archive (other than scy27\_traces which is a special case) is now complete. NFS volumes are available again. If you have problems with the archive NFS service, contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk).

SMB / Windows-style volumes are currently unavailable and will be set up on the new archive server (archive-smb) during the rest of today and tomorrow..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jun &lt;var data-var=&#039;date&#039;&gt; 17&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;18:00:55&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  _Text of the email announcement:_

This message is meant for researchers who store data on the **archive server, archive.cl.cam.ac.uk**. Professional services staff can ignore this announcement. Researchers who aren&#039;t sure whether they use archive can see below for some guidance on how to tell.

**The archive server, archive.cl.cam.ac.uk, is being replaced.** The server will be unavailable on one or two occasions (first one: **4th-5th July**) whilst this work takes place. **You may then need to change how you access the server.**

### **How do I know if I am using archive?**

Most people are **not** using archive. However if you contacted us asking for backup space or more than a few hundred gigabytes of storage at any point during the last several years, we might have provided you with storage on archive. We would generally have discussed this with you when creating the storage, though in some cases that would have been many years ago.

The archive server is **not** related to filer or bigdisc.

Generally any access to archive will be through the hostname **“archive.cl.cam.ac.uk” or simply “archive”**, or paths including the component **&quot;archive&quot; or &quot;gfxdisp&quot;**. (The server name &quot;berilia&quot; is also involved; if you are using anything referring to &quot;berilia&quot; please contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) as this will stop working.)

Archive provides two separate services: NFS (Linux/UNIX style) shares and SMB (Windows-style) shares. (SMB may also be referred to as CIFS. SMB can be accessed from Linux clients too, though mostly it&#039;s intended for Windows.) At the moment, both services use the same server name, archive.cl.cam.ac.uk. We will be migrating both services to a new server, but on different days. However, **all** users of archive will be impacted to some extent on that day, and **almost all** users of archive will need to change how they use the service.

**You are using archive’s SMB (Windows-style) service, a.k.a. CIFS, if any of the following applies:**

* You are using path that look similar to any of the following:  
   * **\\\\archive\\_something_**or **\\\\archive.cl.cam.ac.uk\\_something_**(Windows UNC paths)  
   * **smb://archive.cl.cam.ac.uk/_something_**or **smb://archive/_something_** (Linux/Mac file browser SMB paths)  
   * **//archive.cl.cam.ac.uk/_something_**or **//archive/_something_**(UNC paths as used by some Linux clients such as “mount -t smbfs” or “mount -t cifs”)
* You are using “**corelab\_datasets**” (this is a special case; see below)
* You are using “**scy27\_traces**” (this is a special case; see below)

If any of these apply, see &quot;Impact on SMB users&quot; below; expect a long period of disruption on 4th July, perhaps extending into 5th July, and to have to change how you access archive.

**You are using archive’s NFS (Linux/UNIX-style) service if any of the following applies:**

* You are manually mounting storage from **archive.cl.cam.ac.uk:/export/_something_ or archive:/export/_something_**, or have one of those paths in **/etc/fstab**
* You are using paths under **/auto/archive or /net/archive or /anfs/gfxdisp or /auto/anfs/gfxdisp**

If any of these apply, see &quot;Impact on NFS users&quot; below; expect a short disruption on 4th July, then another email about a longer disruption a month or two later.

### **Impact on SMB users** 

The SMB service will move first, on **4th July**. It will be unavailable for several hours on that day, potentially with disruption extending to the next day due to the large amount of work needed to reinstate this service on a new server.

**Immediately after the work on 4th July, the archive SMB service will move to archive-smb.cl.cam.ac.uk.** You will need to update any path that you use that currently refers to the SMB service on archive.cl.cam.ac.uk.

**Access permissions on SMB shares will be reset!** For each SMB share, we have identified the current users and will grant those users read and write access to the entire share. Every other member of the department will be given read access (in order to minimise disruption, as we believe there is no confidential data on archive). **If you have confidential data on archive that must not be readable by other users, please contact** [**sys-admin@cl.cam.ac.uk**](mailto:sys-admin@cl.cam.ac.uk) **immediately.**

Any custom permissions that may have been set on individual files and folders will be lost. Unfortunately it is not feasible to transfer permissions from the old server to the new one, as we are switching software platform (from Spectra Verde to TrueNAS) and the two platforms store Windows-style ACLs in different, incompatible ways.

Whilst the permissions are being rebuilt on 4th-5th July – which is a manual process – you may find that you are unable to access your storage. We will try to restore read access first; write access may take longer.

**If you are using corelab\_datasets**: This is a special case; it is a SMB (Windows) share, but several members of your group access it from managed Linux systems via a temporary mechanism. We plan to make this share available via both SMB and NFS, to better support access from both operating systems. We will reconfigure each managed Linux machine on which we’ve set up access to this dataset to use NFS, once this volume is available via NFS. Until then, you will be unable to access corelab\_datasets from Linux.

**If you are using scy27\_traces**: As discussed with the users (RT ticket #135996), this is currently a NFS share but is being migrated along with the SMB shares so that we can enable SMB access in future. So this will experience the same disruption as SMB shares, described above, including the reset of permissions. On 4th July it will also start to require a Kerberos ticket for NFS access.

### **Impact on NFS users**

Firstly, on **4th July** the NFS service will be unavailable for a short while (approximately 30-60 minutes) as the whole archive server will need to be shut down in order to move the disks holding SMB data to another server. Preparatory work has already been completed to try to expedite this process; however there is a chance of unexpected problems with bringing the NFS service back online on the old server (due to complications with the old server&#039;s obsolete software platform). We will prioritise bringing the NFS service back up as quickly as possible on 4th July.

There will be a future more-disruptive outage to archive’s NFS service in a month or two when we move that service to the new hardware; the timeline for this is not yet decided.

But if you are accessing an archive NFS filesystem from a Lab-managed Linux/UNIX system, you can **prepare now** for the migration by ensuring that you are **not using a path starting /net/archive**. If you are using /net/archive/export/_SHARE_ please switch to **/auto/archive/_SHARE_** as soon as convenient.

If you are accessing an archive NFS filesystem from a non-Lab-managed computer by manually mounting archive\[.cl.cam.ac.uk\]:/export/SHARE, we suggest that you email [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) now to ask that we help you to set up the Lab automounter on your computer. If you don’t, you’ll have to change to a different path when we move the NFS service.

At the moment, NFS shares on archive only support “sec=sys” (IP-address-based access); they do not use any form of strong authentication. After the upgrade, we will be in a position to optionally authenticate NFS using Kerberos, as we do on filer. Contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) if you are interested in switching your archive NFS share to Kerberos.

### **Updates during the work**

Updates on progress during the work (as well as a copy of this announcement) will be posted at &lt;https://cl.instatus.com/clxj8zjas110017bkln1e6boqwu&gt; . You can subscribe on that page if you would like to receive updates by email.

Please contact [sys-admin@cl.cam.ac.uk](mailto:sys-admin@cl.cam.ac.uk) if you have any questions. Thanks..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Thu, 4 Jul 2024 09:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clxj8zjas110017bkln1e6boqwu</link>
  <guid>https://cl.instatus.com/maintenance/clxj8zjas110017bkln1e6boqwu</guid>
</item>

<item>
  <title>DS-Print maintenance</title>
  <description>
    Type: Maintenance
    Duration: 1 hour

    Affected Components: Other Internal Services
    May 20, 06:30:01 GMT+0 - Identified - Maintenance is now in progress May 20, 07:30:00 GMT+0 - Completed - Maintenance has completed successfully May 20, 06:30:00 GMT+0 - Identified - UIS has announced scheduled maintenance on the DS-Print service on the morning of Monday 20th May. Printing to the DS-Print printers in the Department will not be possible between 7:30am and 8:30am, and multi-function devices will be unresponsive. The Department&#039;s locally managed printers will be unaffected.

UIS are replacing a security certificate and installing an update for one of the technologies we use to provide the service. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 hour</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;06:30:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;07:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 20&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;06:30:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  UIS has announced scheduled maintenance on the DS-Print service on the morning of Monday 20th May. Printing to the DS-Print printers in the Department will not be possible between 7:30am and 8:30am, and multi-function devices will be unresponsive. The Department&#039;s locally managed printers will be unaffected.

UIS are replacing a security certificate and installing an update for one of the technologies we use to provide the service..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 20 May 2024 06:30:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clwau9agz56773nanflht857yr</link>
  <guid>https://cl.instatus.com/maintenance/clwau9agz56773nanflht857yr</guid>
</item>

<item>
  <title>WGB emergency network maintenance</title>
  <description>
    Type: Maintenance
    Duration: 27 minutes

    Affected Components: Network
    May 6, 22:42:06 GMT+0 - Completed - Maintenance has completed successfully. May 6, 22:15:00 GMT+0 - Identified - We will be updating the software on the core router/switch in the William Gates Building (gatwick) in order to attempt to mitigate the ongoing crashes (&lt;https://cl.instatus.com/clvva1e4b43187b8n2hqywstc0&gt;). This upgrade cannot be performed &quot;live&quot;, so there will be approximately 20-30 minutes&#039; outage of the William Gates Building office network, and of filer. Other servers in GN09 should be largely unaffected. May 6, 22:15:01 GMT+0 - Identified - Maintenance is now in progress 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 27 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;22:42:06&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;22:15:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We will be updating the software on the core router/switch in the William Gates Building (gatwick) in order to attempt to mitigate the ongoing crashes (&lt;https://cl.instatus.com/clvva1e4b43187b8n2hqywstc0&gt;). This upgrade cannot be performed &quot;live&quot;, so there will be approximately 20-30 minutes&#039; outage of the William Gates Building office network, and of filer. Other servers in GN09 should be largely unaffected..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;May &lt;var data-var=&#039;date&#039;&gt; 6&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;22:15:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Mon, 6 May 2024 22:15:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clvvhcbjw234652b8n2qzzczn2y</link>
  <guid>https://cl.instatus.com/maintenance/clvvhcbjw234652b8n2qzzczn2y</guid>
</item>

<item>
  <title>RT upgrade</title>
  <description>
    Type: Maintenance
    Duration: 7 hours and 8 minutes

    Affected Components: Request Tracker
    Apr 19, 16:00:01 GMT+0 - Identified - Maintenance is now in progress Apr 19, 16:00:00 GMT+0 - Identified - We will be upgrading RT during this time. Email to sys-admin, building-services and other email addresses that use ticket numbers will be delayed and not acted upon until the upgrade is complete.

The upgrade may take all weekend as it involves a time-consuming database conversion. Apr 19, 23:08:21 GMT+0 - Completed - Maintenance has completed successfully. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 7 hours and 8 minutes</p>
    <p><strong>Affected Components:</strong> </p>
    &lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:01&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Maintenance is now in progress.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;16:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  We will be upgrading RT during this time. Email to sys-admin, building-services and other email addresses that use ticket numbers will be delayed and not acted upon until the upgrade is complete.

The upgrade may take all weekend as it involves a time-consuming database conversion..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Apr &lt;var data-var=&#039;date&#039;&gt; 19&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;23:08:21&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  Maintenance has completed successfully..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Fri, 19 Apr 2024 16:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clv143q2732849bqoibwz3kima</link>
  <guid>https://cl.instatus.com/maintenance/clv143q2732849bqoibwz3kima</guid>
</item>

<item>
  <title>William Gates Building planned power outage</title>
  <description>
    Type: Maintenance
    Duration: 1 day, 5 hours and 44 minutes

    Affected Components: Network, Other Secondary Storage Systems, GPUs, Secondary VM Hosts, Other Internal Services, GN09
    Jan 14, 08:00:00 GMT+0 - Identified - The electrical work is in progress.  Systems in GN09 including GPU VMs will remain off until the work is complete, tentatively estimated for 16:00.  After that it will take some hours to fully restore all systems. Jan 15, 13:44:02 GMT+0 - Completed - We think that (except where we&#039;re already in communication with the affected users about a specific issue) everything is back to normal after the planned electrical shutdown.  Please contact sys-admin if you notice any issues. Jan 14, 17:15:17 GMT+0 - Identified - Power has been restored to the building.  It will now take some time, perhaps hours, to restore all systems starting with core infrastructure.  Please be patient if your system remains unavailable. Jan 14, 15:54:20 GMT+0 - Identified - Revised estimate on the restoration of power to the building: 17:00-17:30. Jan 14, 20:03:02 GMT+0 - Identified - Datacentre infrastructure has been restored.  Owners of servers can now start them via the Caelum console (if access is set up); owners of GPU/CPU development VMs can start them via Xen Orchestra as usual.  Contact sys-admin if any needed system is down or misbehaving. Jan 13, 17:00:00 GMT+0 - Identified - The William Gates Building will be without power all day on Sunday 14th January 2024, due to planned work on our electrical switch gear to connect our new solar panels.  This is the second and final shutdown planned as part of the solar panel installation.

**Nearly all IT services in the William Gates Building will be unavailable for roughly 24 hours, perhaps longer.**  We will start shutting systems down on the evening of Saturday 13th January ready for the power to be turned off the following morning; we expect the power to come back on during the evening of Sunday 14th January but it will then take some time to bring all systems back into operation.  We expect most services to be available by Monday morning, but there is a small chance that a few things won&#039;t initially be working properly on Monday.

Telephones, office networking and wifi will be unavailable all day on Sunday (but the building is likely to be closed in any case).  Please make sure that all computers in offices are shut down (not just asleep) before Saturday evening.

Due to the longer outage this time, we will unfortunately need to shut down all servers in GN09 except for a very small number of critical services such as filer, as the cooling system will be offline all day and temperatures would otherwise climb to unsafe levels.

**This includes nearly all research servers and all GPU servers (including GPU VMs).**  GN09 holds almost all of our server hardware; if you are unsure where your server is located, it is probably in GN09 and will probably be affected.  (A very small number of research systems are in the West Cambridge Data Centre, and will not be affected.)

**The outage is not expected to affect core infrastructure, administrative systems or small VMs** as these are hosted in the West Cambridge Data Centre.  However there is a risk that access to filer from these systems will be disrupted; we don&#039;t plan to turn filer off, but it is in GN09 and we may have to act if it gets too hot.  Where a service is replicated between multiple sites, only one instance of the service may be available (this affects most core services such as LDAP, Active Directory and VPN2).

VMs hosted by the department will stay running unless they are on the GPU VM clusters (this applies both to VMs with GPUs, and VMs with a lot of CPU cores - generally with names that contain &quot;gpu&quot; or &quot;cpu&quot;).

Services hosted externally to the department, for example by UIS, will not be affected - for example Moodle, CamSIS, HPC, Exchange email, Fastmail email and the main departmental (CST) website. 
  </description>
  <content:encoded>
    <![CDATA[<p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 day, 5 hours and 44 minutes</p>
    <p><strong>Affected Components:</strong> , , , , , </p>
    &lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 14&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;08:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The electrical work is in progress.  Systems in GN09 including GPU VMs will remain off until the work is complete, tentatively estimated for 16:00.  After that it will take some hours to fully restore all systems..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 15&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;13:44:02&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Completed&lt;/strong&gt; -
  We think that (except where we&#039;re already in communication with the affected users about a specific issue) everything is back to normal after the planned electrical shutdown.  Please contact sys-admin if you notice any issues..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 14&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:15:17&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Power has been restored to the building.  It will now take some time, perhaps hours, to restore all systems starting with core infrastructure.  Please be patient if your system remains unavailable..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 14&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;15:54:20&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Revised estimate on the restoration of power to the building: 17:00-17:30..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 14&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;20:03:02&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  Datacentre infrastructure has been restored.  Owners of servers can now start them via the Caelum console (if access is set up); owners of GPU/CPU development VMs can start them via Xen Orchestra as usual.  Contact sys-admin if any needed system is down or misbehaving..&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Jan &lt;var data-var=&#039;date&#039;&gt; 13&lt;/var&gt;, &lt;var data-var=&#039;time&#039;&gt;17:00:00&lt;/var&gt; GMT+0&lt;/small&gt;&lt;br&gt;&lt;strong&gt;Identified&lt;/strong&gt; -
  The William Gates Building will be without power all day on Sunday 14th January 2024, due to planned work on our electrical switch gear to connect our new solar panels.  This is the second and final shutdown planned as part of the solar panel installation.

**Nearly all IT services in the William Gates Building will be unavailable for roughly 24 hours, perhaps longer.**  We will start shutting systems down on the evening of Saturday 13th January ready for the power to be turned off the following morning; we expect the power to come back on during the evening of Sunday 14th January but it will then take some time to bring all systems back into operation.  We expect most services to be available by Monday morning, but there is a small chance that a few things won&#039;t initially be working properly on Monday.

Telephones, office networking and wifi will be unavailable all day on Sunday (but the building is likely to be closed in any case).  Please make sure that all computers in offices are shut down (not just asleep) before Saturday evening.

Due to the longer outage this time, we will unfortunately need to shut down all servers in GN09 except for a very small number of critical services such as filer, as the cooling system will be offline all day and temperatures would otherwise climb to unsafe levels.

**This includes nearly all research servers and all GPU servers (including GPU VMs).**  GN09 holds almost all of our server hardware; if you are unsure where your server is located, it is probably in GN09 and will probably be affected.  (A very small number of research systems are in the West Cambridge Data Centre, and will not be affected.)

**The outage is not expected to affect core infrastructure, administrative systems or small VMs** as these are hosted in the West Cambridge Data Centre.  However there is a risk that access to filer from these systems will be disrupted; we don&#039;t plan to turn filer off, but it is in GN09 and we may have to act if it gets too hot.  Where a service is replicated between multiple sites, only one instance of the service may be available (this affects most core services such as LDAP, Active Directory and VPN2).

VMs hosted by the department will stay running unless they are on the GPU VM clusters (this applies both to VMs with GPUs, and VMs with a lot of CPU cores - generally with names that contain &quot;gpu&quot; or &quot;cpu&quot;).

Services hosted externally to the department, for example by UIS, will not be affected - for example Moodle, CamSIS, HPC, Exchange email, Fastmail email and the main departmental (CST) website..&lt;/p&gt;
]]>
  </content:encoded>
  <pubDate>Sat, 13 Jan 2024 17:00:00 +0000</pubDate>
  <link>https://cl.instatus.com/maintenance/clq5kv1te0337b4og6nxsjcpz</link>
  <guid>https://cl.instatus.com/maintenance/clq5kv1te0337b4og6nxsjcpz</guid>
</item>

  </channel>
  </rss>