University of Cambridge Computer Laboratory - Lab GPU clusters: poor I/O performance – Incident details

Lab GPU clusters: poor I/O performance

Resolved
Degraded performance
Started 10 months agoLasted about 2 hours

Affected

Virtual Machine Hosting

Degraded performance from 4:54 PM to 4:54 PM, Operational from 4:54 PM to 6:48 PM

GPUs

Degraded performance from 4:54 PM to 4:54 PM, Operational from 4:54 PM to 6:48 PM

Updates
  • Resolved
    Resolved

    We believe this issue has been resolved, and are in communication with the user who accidentally caused the poor performance. We're also considering hardware upgrades for the storage server.

  • Monitoring
    Monitoring

    We are aware of performance problems on the storage server that hosts dev-gpu/dev-cpu VM disks and data.

    One cause on a user VM causing a very high I/O load has been found and temporarily mitigated. We will continue to monitor the situation as load still seems higher than expected.