We have completed our RCA for this incident, for which the summary is below:
Impact:
We had significant disruption to any UI page that relied on displaying runtime-related information, leading to incomplete or unavailable data for users.
Detection:
This issue was reported to us by customers.
Root Cause:
An unexpected side effect of an API change which caused the event handler to not recognize runtime events as runtimes and instead treat them as generic-entities. When the change was reverted the entries in the generic-entities collection were no longer updated, and an automatic cleaning function then resulted in some UI data queries returning incorrect data.
Resolution:
After resolving the root cause, we rebuilt the required data and reinitialized the runtime information. We have identified improvements to our E2E testing process and monitoring systems as a result of this incident that we will be implementing.