Pipelines are missing
Incident Report for Codefresh
Postmortem

Incident Symptoms

Around 14:50 UTC on Friday, August 13th 2021, our users encountered an issue with missing pipelines in their accounts. 

Impact and Root Cause

The event was triggered by a human error in a routine operation. This error resulted in pipeline data corruption for a portion of our user base. The event was detected by customers and internal teams immediately and the root cause was identified within 15 minutes. 

Investigation

The team started working on the mitigation and restoration process by 15:15 UTC. At this point, we observed some performance issues with our hosted Database vendor infrastructure causing unexpected delays to our restoration process. 

Around 19:20 UTC, after confirming that we were unable to resolve these delays from our vendor, we recommended a workaround to run manual builds in order to allow any necessary changes to be deployed. 

Resolution

Our team continued to work with our DB vendor to improve the restoration efficiency and were able to restore the pipelines completely around 00:30 UTC. New pipelines created during the outage were not replaced or removed to allow for the continuance of service. 

Post Mortem

  1. We have since adjusted our processes and procedures to automate these routine procedures on the Database. 
  2. The major part of the delay in resolution was due to an incident at our DB vendor site which caused delays with restoration. We have worked with our Vendor to improve this process to ensure that future restores are significantly faster.

Please reach out support@codefresh.io if you have any questions.

Posted Aug 17, 2021 - 23:05 UTC

Resolved
This incident has been resolved.
Posted Aug 14, 2021 - 02:47 UTC
Monitoring
We have fully restored pipelines and are now fully operational.
We will continue monitoring the system.
Posted Aug 14, 2021 - 00:40 UTC
Update
We have restored around 80% of all the pipelines, and you should start seeing these in your accounts at this time.
The remaining pipelines are still in the process of being restored.
Posted Aug 13, 2021 - 23:58 UTC
Update
We are verifying the fix to this issue. We should be pushing this fix to production shortly.
Posted Aug 13, 2021 - 22:52 UTC
Update
We are continuing to work towards resolving this issue and recommend manual deployments or new pipelines for any urgent needs. Current and new pipelines will not be affected by the resolution being implemented for the missing pipelines.
Posted Aug 13, 2021 - 19:21 UTC
Identified
The issue has been identified.
ETA for the fix to be implemented is in 4-8 hours
Posted Aug 13, 2021 - 16:05 UTC
Investigating
We are currently investigating this issue.
Posted Aug 13, 2021 - 14:48 UTC
This incident affected: Codefresh Systems (codefresh.io).