Slow /.well-known/openid-configuration endpoints #17
Replies: 10 comments 10 replies
-
It indeed looks like something happened in that upgrade, yes. Looking at the example trace for the GET request indicates that the problem is outside of the IdentityServer pipeline. The *) To be 100% correct, there are a few infrastructure level things that are done outside of that block. The first is that the activity only fires if the path matches an IdentityServer endpoint => the endpoint resolution happens outside of the block. It's a simple for loop with only in memory dependencies and I cannot imagine how that would take close to a second. Also if you are using the dynamic providers feature, handling of those is outside of the My overall feeling here based on the diagnostics shared is that it's something happening before or after the actual IdentityServer code is invoked. Did you do any code changes as part of the upgrade? Any changes to infrastructure? |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response. There is no pre / post processing there but I'll try to get more diagnostics from the NET Core pipeline. The strange thing is, that its only affecting one region in Azure - North Europe has about 20% traffic of West Europe but no issues at all. I'll try to reproduce locally as we see the problem in 3 different environments. I'll also further investigate transitive dependencies Also note, all other protocol endpoints (/introspect, /token etc.) behave normal. Its just the well-known one. |
Beta Was this translation helpful? Give feedback.
-
Checking App Insights today, we've seen the following: /well-known Endpoints have very high latency while other Duende Protocol Endpoints (Introspect, Token) increase as well, just not as much. I'll investige further within the ASP.NET Core pipeline but if you could check on your end as well @AndersAbel to see if there is a difference between the endpoints. We're using YARP as proxy in front of identity server, I'll check if there is a problem there. |
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing those stats. The ./well-known endpoints are actually the ones I consider most simple in their implementation. There is less code to run and less storage/database access. The token endpoint in comparison is more complex, but also (as far as I remember) utilizes all storage/config that the ./well-known endpoints to. I do not doubt that this is a problem, but to properly troubleshoot we would need full activity traces that shows timing all the way from the client's requests to how it is handled on the server side. The only tangible data point we have so far is the one I referenced above and that one shows that the execution of the discovery endpoint class only takes up a fraction of the total time. Are you using the dynamic providers feature? |
Beta Was this translation helpful? Give feedback.
-
I've opened another issue for YARP and after collecting metrics and checking with the team there, it seems that YARP does not cause the latency. Metrics suggest that the request is immediatly forwarded to the network stack. We do not use dynamic providers. As I've mentioned above, we started service the In comparison, the I'll enable ASP.NET core telemetry to see where the additional time comes from. |
Beta Was this translation helpful? Give feedback.
-
I think that is the right next step. Right now we don't know where the time is spent and for any performance issue metrics is the only way to solve it. There are things in the discovery endpoint as well as in the IdentityServer endpoint selection/routing that could potentially cause timing issues (never say never in these cases until it is proved). The only thing I can say is that the numbers shared so far indicate that the issue is outside of the IdentityServer middleware. That doesn't mean IdentityServer is not to blame - we won't know until we have metrics that show where the issue is. |
Beta Was this translation helpful? Give feedback.
-
I'll set metrics logging up, that will require some code changes to use the new OpenTelemetry packages. We noticed that those long running requests come in pairs within a range of 10 ms from the same client /.well-known/openid-configuration one of them finishes within the expected latency, the other is at 1 sec, as if there is a lock / resource contest. |
Beta Was this translation helpful? Give feedback.
-
(note: we're moving this issue to our new community discussions) |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
IdentityServer 7.2 was just released with a preview feature that can cache the output of the discovery endpoint. Can you please try that and see if it makes a difference? |
Beta Was this translation helpful? Give feedback.
-
Which version of Duende IdentityServer are you using?
7.1.0
Which version of .NET are you using?
NET8
Describe the bug
The endpoints
/.well-known/openid-configuration
/.well-known/openid-configuration/jwks
have a 99th percentile performance of ~1 sec with peaks up to 20 secs during ~ 5.45 AM GMT+1
App Insights Performance

Trace

During peaks to ~20 secs, other endpoints slow down considerably as well
To Reproduce
Deploy to Azure App Service
Expected behavior
99th percentile performance of ~20 ms.
Log output/exception with stacktrace
Additional context
Our infrastructure consists of Azure App Services in West- and North Europe, load balanced through Azure Front Door. Our Azure Sql Server is a Business Critical Gen5 / 8 vCores (40 GB Ram). The PaaS resource usage is less than 5%.
We found DuendeArchive/Support#1361 but no improvement occurred after updating Azure.Core to 1.44.
The data protection is configured with EF Core and protected with Azure Key Vault.
NuGet Versions with ~20ms 99th percentile (Pre 8th January)
NuGet Versions with ~1sec 99th percentile (Post 8th January)
We have added aggressive caching with a custom
DiscoveryResponseGenerator
and are now seeing the following behavior:During the time between 14:30 - 14:40, I ran a dummy app to poll the discovery endpoint in an interval - latency was perfectly fine there. After stopping the dummy app, the regular traffic calling the endpoint started seeing latencies of ~1 sec again.
The relevant caching code for
CreateDiscoveryDocumentAsync
is here,CreateJwkDocumentAsync
is similarly implemented.Beta Was this translation helpful? Give feedback.
All reactions