Truv outage
Incident Report for Truv
Postmortem

Introduction

On 05/04/2023 17:28 UTC, our system experienced a major outage. We understand that this outage caused significant disruption and frustration for our users, and we sincerely apologize for the inconvenience and impact this had on you.

Root Cause

During a routine certificate renewal process, a certificate was deployed to the production environment with unintended configuration changes, affecting the servicing of production web traffic. This misconfiguration resulted in errors and service disruptions for affected users.

Remediation

We implemented a number of corrective actions, including reverting the configuration change and implementing additional checks to prevent similar incidents from occurring in the future.

Preventative Measures

To prevent similar incidents from occurring in the future, the incident response team will implement the following preventative measures:

  • Implement additional quality control checks in the certificate deployment process
  • Implement additional checks when scheduling configuration changes for the production environment
  • Conduct additional training for personnel involved in certificate deployment

Conclusion

We take the responsibility of providing reliable and secure services to our users very seriously, and we apologize for the disruption and impact this outage caused. We are committed to preventing similar incidents from occurring in the future, and we appreciate your patience and understanding as we work to improve the reliability and resiliency of our systems.

Posted May 04, 2023 - 19:45 UTC

Resolved
This incident has been resolved.
Posted May 04, 2023 - 18:15 UTC
Monitoring
A fix has been deployed, and we are monitoring results.
Posted May 04, 2023 - 18:00 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted May 04, 2023 - 17:32 UTC
Update
We are continuing to investigate this issue.
Posted May 04, 2023 - 17:28 UTC
Investigating
We are experiencing degraded performance. We are currently investigating.
Posted May 04, 2023 - 17:28 UTC
This incident affected: API, Dashboard, and Truv Bridge.