OC11 Data Center Outage - Multiple Services Affected
Incident Report for MIT
Resolved
This incident has been resolved.
Posted 3 months ago. Jun 28, 2018 - 17:44 EDT
Update
IS&T staff continue to work to restore services. TSM is now fully functional, and backups to backup-L, oc11-bk-ent-1, and all TSM servers, should occur as scheduled tonight .
Posted 3 months ago. Jun 25, 2018 - 16:14 EDT
Update
IS&T staff worked throughout the day on Sunday to restore power to additional services where possible.
The TSM backup servers oc11-bk-ent-1 and backup-L are up and available for restores, but are operating in a degraded mode, and not running scheduled backups. IS&T currently hopes to have TSM restored to full capacity by the end of business on Monday.
Posted 3 months ago. Jun 24, 2018 - 20:26 EDT
Update
Server Backups: The TSM (Tivoli Storage Manager) servers backup-L, oc11-bk-ent-1, backup-e, and oc11-bk-ops-1 have been affected by this outage. IS&T staff have good reason to believe that any backups made to these (or other) TSM servers prior to 5:30pm on Friday are safely stored on disk or tape. Only a small amount of this stored data (generally very recent backups) actually resides in oc11-7, and that storage media was successfully powered on Saturday afternoon. These four TSM servers remain off, and are reliant on some networking equipment that got wet. IS&T plans to evaluate that gear Sunday, to determine the best course of action. Desktop backups using Crashplan were not affected by this outage.
Posted 3 months ago. Jun 23, 2018 - 22:55 EDT
Update
Power remains unavailable to one half of the OC11-7 data center. Technicians will be working this evening and throughout the night to repair systems such that power can be safely restored, and IS&T will assess conditions and provide an update Sunday morning.
Posted 3 months ago. Jun 23, 2018 - 17:42 EDT
Update
Network and core infrastructure services have been restored to the OC11-7 data center. IS&T continues to work with the Markley Group on restoring power to the other half of the data center and will next provide an update at 4pm.
Posted 3 months ago. Jun 23, 2018 - 14:18 EDT
Update
Power has been restored to one side of the OC11-7 data center and IS&T teams are on site to check equipment and begin restoring service. IS&T will next provide an update at 3pm.
Posted 3 months ago. Jun 23, 2018 - 13:04 EDT
Update
Work continues to restore power to the OC11-7 data center. Due to the nature of the damage, restoring power is more complex than anticipated and will not commence until 12:00pm at the earliest. IS&T will next provide an update at 1:00pm.
Posted 3 months ago. Jun 23, 2018 - 11:13 EDT
Monitoring
Email service has been restored for all users.

Electrical crews will be working through the night to run an alternate power feed that bypasses the damaged electrical equipment. The estimated time to restore power is 10am Saturday, June 23, at the earliest. Once power is returned to the data center, IS&T staff will begin the process of restoring access to affected services.
Posted 3 months ago. Jun 22, 2018 - 23:00 EDT
Update
Email service has been fully restored (with the exception of a handfull of users), and is running out of an alternate location.

Electrical crews will be working through the night to run an alternate power feed that bypasses the damaged electrical equipment. The estimated time to restore power is 10am Saturday, June 23, at the earliest. Once power is returned to the data center, IS&T staff will begin the process of restoring access to affected services.
Posted 3 months ago. Jun 22, 2018 - 22:27 EDT
Identified
Email service remains impacted by a fire and power outage in the OC11 data center. Estimated time for full restoration of email service is now 10pm.
Posted 3 months ago. Jun 22, 2018 - 21:13 EDT
Investigating
An issue affecting the OC11 data center has impacted multiple services, including Exchange email, TSM backups, non-production environments for MITSIS, and other services. IS&T is investigating the issue and anticipates email service will be fully restored by 7:30.
Posted 3 months ago. Jun 22, 2018 - 18:40 EDT
This incident affected: General, Email, and Network.