Post-Incident Report & Root Cause Analysis
On 8th December, Orca's AWS asset-visibility pipeline experienced an unexpected interruption that temporarily affected how a limited subset of AWS assets appeared within the platform.
Some customers observed a brief period where certain assets appeared missing and later re-appeared as newly or updated discovered assets.
This document provides a high-level explanation of what occurred, the customer-facing impact, and the long-term corrective actions we have taken.
What Happened
Orca's cloud-asset modeling relies on a combination of AWS-hosted public endpoint and internal logic, to determine service availability.
During the incident window, an unexpected change in the output of that AWS-hosted endpoint - caused Orca to incorrectly treat some AWS services as temporarily unavailable.
As a result, some assets that relied on those API responses were not modeled during that cycle, which led them to appear temporarily unavailable in the platform.
Once our engineering team identified and resolved the issue, a full re-scan of affected accounts restored complete and accurate asset visibility. However, it led to creation of some assets, as if they were newly discovered.
Customer Impact
The impact was limited to asset visibility only and affected a small subset of AWS accounts. No security, monitoring, alert logic, or runtime protection was disabled.
Root Cause
The underlying cause was the interaction of:
- An AWS service change, affecting Orca's visibility of a certain API's availability
- Our monitoring worked correctly, halting affected modeling to prevent propagation of incorrect data - but resulting in temporary asset suppression.
- While the detection mechanisms behaved as designed, this combination created a unique scenario where asset visibility was interrupted before full context was available.
Resolution
Our engineering team implemented a fix to restore stable modeling logic and eliminate reliance on the affected API behavior. Scanning was re-enabled after validation and a full asset refresh was completed.
Impacted assets should have now re-appeared and are accurately represented.
Closing Statement
We understand that uninterrupted asset visibility is essential for operational awareness and downstream automations.
While this type of upstream behavior change is very unlikely and uncommon, our alerting and monitoring acted as intended - halting propagation of uncertain data until the issue was understood and resolved.
Our R&D teams are already working on enhanced safeguards to reduce the likelihood of similar issues in the future.
We acknowledge the temporary inconvenience caused particularly around "new asset" and alerts, and we are committed to ensuring even smoother resilience in the future.