Completed
William Gates Building planned power outage

Status
Resolved after 2 days
Started
January 13, 2024 at 5:00 PM
Completed
January 14, 2024 at 10:44 PM
Affects
Network
Internal Services
Other Internal Services
Virtual Machine Hosting
GPUs
Secondary VM Hosts
Datacentres
GN09
Data Storage
Other Secondary Storage Systems
  • Completed
    January 15, 2024 at 1:44 PM
    Completed
    January 15, 2024 at 1:44 PM

    We think that (except where we're already in communication with the affected users about a specific issue) everything is back to normal after the planned electrical shutdown. Please contact sys-admin if you notice any issues.

  • Update
    January 14, 2024 at 8:03 PM
    In progress
    January 14, 2024 at 8:03 PM

    Datacentre infrastructure has been restored. Owners of servers can now start them via the Caelum console (if access is set up); owners of GPU/CPU development VMs can start them via Xen Orchestra as usual. Contact sys-admin if any needed system is down or misbehaving.

  • Update
    January 14, 2024 at 5:15 PM
    In progress
    January 14, 2024 at 5:15 PM

    Power has been restored to the building. It will now take some time, perhaps hours, to restore all systems starting with core infrastructure. Please be patient if your system remains unavailable.

  • Update
    January 14, 2024 at 3:54 PM
    In progress
    January 14, 2024 at 3:54 PM

    Revised estimate on the restoration of power to the building: 17:00-17:30.

  • In progress
    January 14, 2024 at 8:00 AM
    In progress
    January 14, 2024 at 8:00 AM

    The electrical work is in progress. Systems in GN09 including GPU VMs will remain off until the work is complete, tentatively estimated for 16:00. After that it will take some hours to fully restore all systems.

  • Planned
    January 13, 2024 at 5:00 PM
    Planned
    January 13, 2024 at 5:00 PM

    The William Gates Building will be without power all day on Sunday 14th January 2024, due to planned work on our electrical switch gear to connect our new solar panels. This is the second and final shutdown planned as part of the solar panel installation.

    Nearly all IT services in the William Gates Building will be unavailable for roughly 24 hours, perhaps longer. We will start shutting systems down on the evening of Saturday 13th January ready for the power to be turned off the following morning; we expect the power to come back on during the evening of Sunday 14th January but it will then take some time to bring all systems back into operation. We expect most services to be available by Monday morning, but there is a small chance that a few things won't initially be working properly on Monday.

    Telephones, office networking and wifi will be unavailable all day on Sunday (but the building is likely to be closed in any case). Please make sure that all computers in offices are shut down (not just asleep) before Saturday evening.

    Due to the longer outage this time, we will unfortunately need to shut down all servers in GN09 except for a very small number of critical services such as filer, as the cooling system will be offline all day and temperatures would otherwise climb to unsafe levels.

    This includes nearly all research servers and all GPU servers (including GPU VMs). GN09 holds almost all of our server hardware; if you are unsure where your server is located, it is probably in GN09 and will probably be affected. (A very small number of research systems are in the West Cambridge Data Centre, and will not be affected.)

    The outage is not expected to affect core infrastructure, administrative systems or small VMs as these are hosted in the West Cambridge Data Centre. However there is a risk that access to filer from these systems will be disrupted; we don't plan to turn filer off, but it is in GN09 and we may have to act if it gets too hot. Where a service is replicated between multiple sites, only one instance of the service may be available (this affects most core services such as LDAP, Active Directory and VPN2).

    VMs hosted by the department will stay running unless they are on the GPU VM clusters (this applies both to VMs with GPUs, and VMs with a lot of CPU cores - generally with names that contain "gpu" or "cpu").

    Services hosted externally to the department, for example by UIS, will not be affected - for example Moodle, CamSIS, HPC, Exchange email, Fastmail email and the main departmental (CST) website.