You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some history behind the issue. We were using Synapse 1.98, and to avoid flooding from the presence feature, we applied this patch from this issue and this worked well.
Recently we updated to Synapse 1.125 and the presence change storm came back. Since rate-limit was implemented in Synapse 1.124, we removed our patch to rely on the new mechanism. While being flooded, we noticed that the consequences of the issue were worse than 18 months ago since it now crashes the entire server.
In the logs we noticed something new, logs like this :
2025-03-07 09:25:54,626 - root - 274 - WARNING - sentinel - Lock timeout is getting excessive: 747219185633533129613811404156259233971089331878494532580210173424746711084791089106082450955110608529191362653363165049891653914480755326168880379598947624597795819941201301722130188899194899188406241147079660310783397469716735872037991981377152862739140111812744262910064693812209876190943167529751342311021438043008766494386184259979355506403995836956665016959707590206415293601926385161774355252449900571232800892433503149036264336316069226081064285531353458681141174508391246376982247773496285591867121965950286134644640781796594767551542605817732736086479589950474537245937452620560279516506111800408344608704247581899721751284963506571721030207915157769086556621383347464138459152826527440828307470515494751150272907296532529180932330417382199017641281341938894247873084044949264513586530791715776157195864081695865474525857847622644476407492952918516142914929136797578388466774197963183523845956350122335878508129342025334156666552054326202550326485043561191351799892551223711961528027686644480218096762732291965060003061050028033335872339112405294435751639772818803451177588066926898761434627234487776464285349768365706382064791455203309929994154976493584140472710746933861949542801661630756263198164990225539326611874898302760099838901970968596141391648096727535495503897979803080672024027459912704362578416176911635397757139964739357038277851585728666776895373765608570647686093473887308201039142657940120297326956049882822960502454146283788858787588946385242858784617533056316840416022802656062667216587906480699108353098466195511463041051980606728234595368633567054652240515183387619626712437486142060904412242208068073853349370190349996437020917874281874472449309457533001115061932946571257993872151716476965154380052158271761367515928848363670361536804997687046238888318189890904565417990783522062983779886083495162425327856299635772197082120679631818500477576148326700659966608263667536127973810200161576847547909210584488635238169514578824197102334699499850271788619662231511157880598626061015924144894732401615621048000940137900441781917214112907301246928668644754020326849970793393644133655431868844779408488531743147107928212140321119728600694934961986174451590811475349509285023725593029583713348770303758571147680603011472587785179189642925094732465784568894329378358171617130496801135984837964087019951017442976575063637917851931663233704807776385333953560830251511622299846466485700728652105643881807497353915555948896111037116287626148541731716755278088886709216185081733693593143846299786623744861263510060101479838004698863937642591585510624962890110914495234927458425690031665268865820106811281692028642365755300516005299655933904736160288065523583867888095499126376994279388156945594517570997481903250147699316666884004524181778850203923086527069856639969458336830856557035167315582885378566579015236025670281086122217638433805035789671710740252287834361079927478829034217608879097774080s. There may be a deadlock.
We saw this logging was introduced in worker_lock.py 3 weeks ago, but the figures for a retry interval seem abnormal.
What can cause those insanes figures in the worker locks?
Steps to reproduce
have multiple users using the service with multiple devices at the same time
see the flood happen
check the logs for the weird entries
Homeserver
Hosted homeserver
Synapse Version
1.125
Installation Method
Docker (matrixdotorg/synapse)
Database
PostgreSQL 15
Workers
Single process
Platform
Containers running on an AlmaLinux 9 host.
Configuration
No response
Relevant log output
2025-03-07 09:25:54,626 - root - 274 - WARNING - sentinel - Lock timeout is getting excessive: 747219185633533129613811404156259233971089331878494532580210173424746711084791089106082450955110608529191362653363165049891653914480755326168880379598947624597795819941201301722130188899194899188406241147079660310783397469716735872037991981377152862739140111812744262910064693812209876190943167529751342311021438043008766494386184259979355506403995836956665016959707590206415293601926385161774355252449900571232800892433503149036264336316069226081064285531353458681141174508391246376982247773496285591867121965950286134644640781796594767551542605817732736086479589950474537245937452620560279516506111800408344608704247581899721751284963506571721030207915157769086556621383347464138459152826527440828307470515494751150272907296532529180932330417382199017641281341938894247873084044949264513586530791715776157195864081695865474525857847622644476407492952918516142914929136797578388466774197963183523845956350122335878508129342025334156666552054326202550326485043561191351799892551223711961528027686644480218096762732291965060003061050028033335872339112405294435751639772818803451177588066926898761434627234487776464285349768365706382064791455203309929994154976493584140472710746933861949542801661630756263198164990225539326611874898302760099838901970968596141391648096727535495503897979803080672024027459912704362578416176911635397757139964739357038277851585728666776895373765608570647686093473887308201039142657940120297326956049882822960502454146283788858787588946385242858784617533056316840416022802656062667216587906480699108353098466195511463041051980606728234595368633567054652240515183387619626712437486142060904412242208068073853349370190349996437020917874281874472449309457533001115061932946571257993872151716476965154380052158271761367515928848363670361536804997687046238888318189890904565417990783522062983779886083495162425327856299635772197082120679631818500477576148326700659966608263667536127973810200161576847547909210584488635238169514578824197102334699499850271788619662231511157880598626061015924144894732401615621048000940137900441781917214112907301246928668644754020326849970793393644133655431868844779408488531743147107928212140321119728600694934961986174451590811475349509285023725593029583713348770303758571147680603011472587785179189642925094732465784568894329378358171617130496801135984837964087019951017442976575063637917851931663233704807776385333953560830251511622299846466485700728652105643881807497353915555948896111037116287626148541731716755278088886709216185081733693593143846299786623744861263510060101479838004698863937642591585510624962890110914495234927458425690031665268865820106811281692028642365755300516005299655933904736160288065523583867888095499126376994279388156945594517570997481903250147699316666884004524181778850203923086527069856639969458336830856557035167315582885378566579015236025670281086122217638433805035789671710740252287834361079927478829034217608879097774080s. There may be a deadlock.
Anything else that would be useful to know?
Glimpses of the logging here:
deadlock.mp4
The text was updated successfully, but these errors were encountered:
Description
Hello there,
Some history behind the issue. We were using Synapse 1.98, and to avoid flooding from the presence feature, we applied this patch from this issue and this worked well.
Recently we updated to Synapse 1.125 and the presence change storm came back. Since rate-limit was implemented in Synapse 1.124, we removed our patch to rely on the new mechanism. While being flooded, we noticed that the consequences of the issue were worse than 18 months ago since it now crashes the entire server.
In the logs we noticed something new, logs like this :
2025-03-07 09:25:54,626 - root - 274 - WARNING - sentinel - Lock timeout is getting excessive: 747219185633533129613811404156259233971089331878494532580210173424746711084791089106082450955110608529191362653363165049891653914480755326168880379598947624597795819941201301722130188899194899188406241147079660310783397469716735872037991981377152862739140111812744262910064693812209876190943167529751342311021438043008766494386184259979355506403995836956665016959707590206415293601926385161774355252449900571232800892433503149036264336316069226081064285531353458681141174508391246376982247773496285591867121965950286134644640781796594767551542605817732736086479589950474537245937452620560279516506111800408344608704247581899721751284963506571721030207915157769086556621383347464138459152826527440828307470515494751150272907296532529180932330417382199017641281341938894247873084044949264513586530791715776157195864081695865474525857847622644476407492952918516142914929136797578388466774197963183523845956350122335878508129342025334156666552054326202550326485043561191351799892551223711961528027686644480218096762732291965060003061050028033335872339112405294435751639772818803451177588066926898761434627234487776464285349768365706382064791455203309929994154976493584140472710746933861949542801661630756263198164990225539326611874898302760099838901970968596141391648096727535495503897979803080672024027459912704362578416176911635397757139964739357038277851585728666776895373765608570647686093473887308201039142657940120297326956049882822960502454146283788858787588946385242858784617533056316840416022802656062667216587906480699108353098466195511463041051980606728234595368633567054652240515183387619626712437486142060904412242208068073853349370190349996437020917874281874472449309457533001115061932946571257993872151716476965154380052158271761367515928848363670361536804997687046238888318189890904565417990783522062983779886083495162425327856299635772197082120679631818500477576148326700659966608263667536127973810200161576847547909210584488635238169514578824197102334699499850271788619662231511157880598626061015924144894732401615621048000940137900441781917214112907301246928668644754020326849970793393644133655431868844779408488531743147107928212140321119728600694934961986174451590811475349509285023725593029583713348770303758571147680603011472587785179189642925094732465784568894329378358171617130496801135984837964087019951017442976575063637917851931663233704807776385333953560830251511622299846466485700728652105643881807497353915555948896111037116287626148541731716755278088886709216185081733693593143846299786623744861263510060101479838004698863937642591585510624962890110914495234927458425690031665268865820106811281692028642365755300516005299655933904736160288065523583867888095499126376994279388156945594517570997481903250147699316666884004524181778850203923086527069856639969458336830856557035167315582885378566579015236025670281086122217638433805035789671710740252287834361079927478829034217608879097774080s. There may be a deadlock.
We saw this logging was introduced in worker_lock.py 3 weeks ago, but the figures for a retry interval seem abnormal.
What can cause those insanes figures in the worker locks?
Steps to reproduce
have multiple users using the service with multiple devices at the same time
see the flood happen
check the logs for the weird entries
Homeserver
Hosted homeserver
Synapse Version
1.125
Installation Method
Docker (matrixdotorg/synapse)
Database
PostgreSQL 15
Workers
Single process
Platform
Containers running on an AlmaLinux 9 host.
Configuration
No response
Relevant log output
Anything else that would be useful to know?
Glimpses of the logging here:
deadlock.mp4
The text was updated successfully, but these errors were encountered: