Postmortem: Logto auth service outage

On June 12, 2025, Logto services on `logto.app` were briefly disrupted by a Cloudflare outage affecting request routing. The issue was resolved quickly, with no impact on data security or core services.

Gao

Founder

6/13/2025less than a minute read

Stop wasting weeks on user auth

Launch secure apps faster with Logto. Integrate user auth in minutes, and focus on your core product.

Get started

Incident summary

Between 18:07 and 18:58 UTC on June 12, 2025, users accessing Logto services via the logto.app domain (including custom domains) experienced errors. The disruption was due to an outage in Cloudflare Workers KV, which impacted our routing layer. Logto Cloud services and Logto Console, which use direct DNS resolution and do not depend on Cloudflare Workers, were not affected. Service was restored within an hour, with no impact on data security.

Timeline (UTC)

18:07: Logto auth service APIs began returning 500 errors for requests via logto.app.
18:24: Investigation confirmed the Azure backend was healthy, and the issue was isolated to Cloudflare Workers and KV.
18:48: Cloudflare officially acknowledged an incident affecting Workers and KV.
18:58: We deployed a temporary workaround by removing the caching logic, which restored service with possible minor performance degradation.
21:00: After Cloudflare services stabilized, we redeployed the cache logic with a graceful fallback. Full performance was restored and the service is now resilient to similar KV outages.

Root cause

This incident was triggered by downtime in Cloudflare Workers KV. Our Cloudflare Worker routes requests to the correct Logto region for each tenant or domain to ensure proper data residency and compliance. To improve performance, the Worker uses KV to cache these region mappings. When KV became unavailable, cache operations failed and the Worker threw errors instead of falling back to a no-cache behavior, causing service disruption.

Logto Cloud services and Logto Console were not affected because they rely on direct DNS resolution and do not use Cloudflare Workers for routing.

Resolution and improvements

Removed the caching dependency from the Worker, restoring service.
After Cloudflare KV recovered, redeployed the cache logic with a graceful fallback. If the cache is unavailable, service continues using direct routing without disruption.
Ongoing improvements to infrastructure to further increase resilience and availability.

Impact

Users accessing Logto via logto.app experienced errors for about 50 minutes.
No customer data was lost or compromised.
Logto Cloud services and Logto Console remained fully operational.

Next steps

We will review and improve our error handling in edge infrastructure.
We will explore using multiple vendors for upstream infrastructure to avoid single points of failure.

Thank you for your patience and support.

Incident summary#

Timeline (UTC)#

Root cause#

Resolution and improvements#

Impact#

Next steps#