Postmortem: unexpected JWT `iss` change

Incident report for the 2024-03-18 unexpected JWT `iss` change.
Sijie
SijieDeveloper
March 25, 20243 min read
Postmortem: unexpected JWT `iss` change

Summary

On 2024-03-18, an update changing JWT issuer behavior in Logto Cloud broke auth flows for users with custom domains and iss validation. The fix required these users to update their validation logic.

  • Affected users: users with custom domains enabled and performing iss validation.
  • Severity: Critical, breaking iss validation within auth flows.

Root cause

The update changed the iss field to match the requested domain, breaking existing validations that expected the previous default issuer.

Timeline

  • 2024-03-18 10:00 (UTC): updates deployed, changing iss behavior.
  • 2024-03-18 23:30 (UTC): first user report received on existing behavior breaking.
  • 2024-03-19 12:00 (UTC): confirmed the issue and started to investigate.
  • 2024-03-19 14:00 (UTC): identified the root cause and impact.
  • 2024-03-20 20:00 (UTC): prepared the email to affected users.
  • 2024-03-20 06:00 (UTC): sent emails to all the affected users.

Impact analysis

Details of the release

Logto Cloud supports custom domain for auth, developers who have custom domain enabled tenants can set the endpoint to the custom domain in SDKs, then the end user will use this endpoint to init auth process and get tokens. Some tokens are in the form of JWT, which includes a iss field indicating the issuer of this token. Previously, even when a custom domain endpoint was used to request an access token, the issuer would still default to our standard domain ([tenant-id].logto.app).

But the issuer’s domain should be the same as the requested endpoint. So we released an update to fix this issue, and now the iss field will automatically reflect the domain used in the request.

For those who are already using custom domain to grant tokens and implemented iss field validation in the resource server, this could be a breaking change. Existing auth check will fail because of the change of issuer. In order to fix this, the developers need to change the validation code, replace the expected issuer to the new one with custom domain.

We failed to fully consider the impact on existing iss validations, as a result, this release become a breaking change without priorer notification.

Resolution

Notified affected users via email, advising them to update their iss validation to match the requested domain.

Rollbacks?

The change is a necessary fix for the issuer field, and some users may have already adapted to the new behavior. A rollback will cause confusion and inconsistency.

Lessons learned

  • Code changes affecting core authentication must have sign-off by the team in addition to regular reviews.
  • Automatic tests should cover more cases, especially on cloud-specific scenarios.

Corrective and preventative measures

  • Add integration tests: Add test cases to cover the scenario in this incident.
  • Feature monitoring projects: In addition to Logto Cloud, create our own side projects and deeply integrate with Logto to catch potential issues before releases.