@@ -268,10 +268,13 @@ selecting a server for a retry attempt.
268
268
3a. Selecting the server for retry
269
269
''''''''''''''''''''''''''''''''''
270
270
271
- If the driver cannot select a server for a retry attempt or the newly selected
272
- server does not support retryable reads, retrying is not possible and drivers
273
- MUST raise the previous retryable error. In both cases, the caller is able to
274
- infer that an attempt was made.
271
+ In a sharded cluster, the server on which the operation failed MUST be provided
272
+ to the server selection mechanism as a deprioritized server.
273
+
274
+ If the driver cannot select a server for
275
+ a retry attempt or the newly selected server does not support retryable reads,
276
+ retrying is not possible and drivers MUST raise the previous retryable error.
277
+ In both cases, the caller is able to infer that an attempt was made.
275
278
276
279
3b. Sending an equivalent command for a retry attempt
277
280
'''''''''''''''''''''''''''''''''''''''''''''''''''''''
@@ -357,9 +360,17 @@ and reflects the flow described above.
357
360
*/
358
361
function executeRetryableRead(command , session ) {
359
362
Exception previousError = null ;
363
+ Server previousServer = null ;
360
364
while true {
361
365
try {
362
- server = selectServer ();
366
+ if (previousServer == null ) {
367
+ server = selectServer ();
368
+ } else {
369
+ // If a previous attempt was made, deprioritize the previous server
370
+ // where the command failed.
371
+ deprioritizedServers = [ previousServer ];
372
+ server = selectServer (deprioritizedServers );
373
+ }
363
374
} catch (ServerSelectionException exception ) {
364
375
if (previousError == null ) {
365
376
// If this is the first attempt, propagate the exception.
@@ -416,9 +427,11 @@ and reflects the flow described above.
416
427
} catch (NetworkException networkError ) {
417
428
updateTopologyDescriptionForNetworkError(server , networkError );
418
429
previousError = networkError ;
430
+ previousServer = server ;
419
431
} catch (NotWritablePrimaryException notPrimaryError ) {
420
432
updateTopologyDescriptionForNotWritablePrimaryError(server , notPrimaryError );
421
433
previousError = notPrimaryError ;
434
+ previousServer = server ;
422
435
} catch (DriverException error ) {
423
436
if ( previousError != null ) {
424
437
throw previousError ;
@@ -614,8 +627,8 @@ The spec concerns itself with retrying read operations that encounter a
614
627
retryable error (i .e . no response due to network error or a response indicating
615
628
that the node is no longer a primary ). A retryable error may be classified as
616
629
either a transient error (e .g . dropped connection , replica set failover ) or
617
- persistent outage . If a transient error results in the server being marked as
618
- " unknown" , a subsequent retry attempt will allow the driver to rediscover the
630
+ persistent outage . If a transient error results in the server being marked as
631
+ " unknown" , a subsequent retry attempt will allow the driver to rediscover the
619
632
primary within the designated server selection timeout period (30 seconds by
620
633
default ). If server selection times out during this retry attempt , we can
621
634
reasonably assume that there is a persistent outage . In the case of a persistent
@@ -678,6 +691,9 @@ degraded performance can simply disable ``retryableReads``.
678
691
Changelog
679
692
=========
680
693
694
+ :2023 - 08 - ?? : Require that in a sharded cluster the server on which the
695
+ operation failed MUST be provided to the server selection
696
+ mechanism as a deprioritized server .
681
697
:2023 - 08 - 21 : Update Q & A that contradicts SDAM transient error logic
682
698
:2022 - 11 - 09 : CLAM must apply both events and log messages .
683
699
:2022 - 10 - 18 : When CSOT is enabled multiple retry attempts may occur .
0 commit comments