Synchronous workflow update #3822

alexshtin · 2023-01-20T07:34:33Z

What changed?
First version synchronous workflow update implemented with messages.
Corresponding API changes: temporalio/api#253.
Messages protocol implementation added #3843.

Why?
New feature "Synchronous workflow update" which allows to synchronously update running workflow.

How did you test it?
New functional tests.

Potential risks
No risks.

Is hotfix candidate?
No.

mmcshane

Broadly speaking looks good. I think we can make the Update type more powerful to consolidate a few things but that can be TODO. Couple of questions as well.

service/frontend/workflow_handler.go

mmcshane · 2023-01-23T14:37:33Z

service/history/commandChecker.go

+		} else {
+			return serviceerror.NewInvalidArgument(fmt.Sprintf("unknown message type: %v", message.GetBody().GetTypeUrl()))
+		}
+	}


I think this function admits the possibility of Accept followed by Reject and vice-versa. Not necessary for this PR but we should probably separate message validation semantics from protocol validation semantics.

This method only validates sequencing and it requires update.Response (i.e. complete) message goes last. I am not sure I got your point.

mmcshane · 2023-01-23T14:39:25Z

service/history/historyEngine.go

+	}
+
+	req := request.GetRequest()
+	updateRegistry := ms.GetUpdateRegistry()


Will we go toward a more general ProtocolRegistry eventually or are we going to stick with protocol-type-specific registries?

I don't know yet. Currently, update registry acts more like "update callers registry" and it looks more natural to have it per user facing API (i.e. UpdateWorkflowExecution). I think we need at least one more protocol implementation to make a call.

service/history/historyEngine.go

service/history/workflow/mutable_state_rebuilder.go

mmcshane · 2023-01-23T14:59:02Z

service/history/workflow/update/update.go

+}
+
+func (u *Update) MessageID() string {
+	return u.messageID


This seems strange ... is this the messageID of the initial request message? Might just need a better function name here.

This is also gone.

mmcshane · 2023-01-23T15:02:33Z

service/history/workflow/workflow_task_state_machine.go

 	return startedEvent, workflowTask, err
 }
+func (m *workflowTaskStateMachine) skipWorkflowTaskCompleted(workflowTaskType enumsspb.WorkflowTaskType, request *workflowservice.RespondWorkflowTaskCompletedRequest) bool {
+	if workflowTaskType != enumsspb.WORKFLOW_TASK_TYPE_SPECULATIVE || len(request.GetCommands()) != 0 {


FWIW, this function actually gets easier if we end up going down the ProtocolCommand route.

mmcshane · 2023-01-23T15:22:47Z

service/history/workflowTaskHandler.go

+	}
+
+	return nil
+}


TODO but later - we definitely want to do this dispatch within a protocol state machine. On the sdk-side we find the protocol object by protocol_instance_id and then just call HandleMessage.

service/history/workflow/mutable_state_impl.go

mmcshane · 2023-01-23T15:38:55Z

service/history/workflowTaskHandlerCallbacks.go

 	var newWorkflowTaskScheduledEventID int64
 	if createNewWorkflowTask {
+		// TODO (alex-update): Need to support case when ReturnNewWorkflowTask=false and WT.Type=Speculative.
+		// In this case WT needs to be added directly to matching.
+		// Current implementation will create normal WT.


I understand now - good comment.

alexshtin · 2023-01-26T01:35:17Z

#3848 should fix linter warnings.

MichaelSnowden · 2023-01-30T19:55:09Z

service/frontend/workflow_handler.go

@@ -3647,6 +3649,35 @@ func (wh *WorkflowHandler) UpdateWorkflowExecution(
 		return nil, errRequestNotSet
 	}

+	if err := validateExecution(request.GetWorkflowExecution()); err != nil {


Consider using a parsed type here instead: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

Examples in Haskell is not the best thing to convince people to follow your ideas :-) But I got the point. It is worth considering this approach project wide.

MichaelSnowden · 2023-01-30T19:57:36Z

service/history/api/updateworkflow/api.go

+		return nil, consts.ErrWorkflowExecutionNotFound
+	}
+
+	upd, removeFn := ms.UpdateRegistry().Add(req.GetRequest().GetRequest())


Is there anything we can do to prevent this stuttering?

Those are all different "requests" and I hate this stuttering but:

historyUpdateRequest.GetFrontendUpdateRequest().GetUpdateRequest()

would be even worse.

MichaelSnowden

There's a bit of tech debt added here, but most of it's tagged, and I think the importance of this change outweighs the cost

alexshtin · 2023-01-30T21:14:32Z

There is even more tech debt in my head, and I admit, I contributed to it. I left bunch of TODOs for myself and I promise to address them (and those from my head too).

yycptt

Some other points we synced offline

Reapply updates (future work)
Replication layer should not return not Implemented for new event types
Check workflow completion when applying messages
Anything else? I remember there maybe 1-2 more?

service/history/api/updateworkflow/api.go

service/history/workflow/update/registry.go

yycptt · 2023-01-31T02:48:08Z

service/history/workflow/workflow_task_state_machine.go

+	// Always bypass task generation for speculative WT.
+	if workflowTask.Type != enumsspb.WORKFLOW_TASK_TYPE_SPECULATIVE {


speculative WT doesn't have (start to close) timeout?

It seems like I need special in-memory timer to handle this timeout. I made a note and will do it in separate PR.

service/history/workflow/workflow_task_state_machine.go

service/history/workflowTaskHandler.go

yycptt · 2023-01-31T03:12:56Z

service/history/workflowTaskHandler.go

+		updResponse.Size(),
+		"Message body of type update.Response exceeds size limit.",
+	); err != nil {
+		return handler.failWorkflow(enumspb.WORKFLOW_TASK_FAILED_CAUSE_BAD_UPDATE_WORKFLOW_EXECUTION_MESSAGE, err)


Will this add a workflow task failed event to the history for speculative WT?

Yes, this seems to be broken and will lead to WTFailed event w/o preceding WTScheduled/WTStarted events. I will address this in separate PR.

This was addressed here. Current behaviour: first speculative WT failure is not written to the history, but new WTis recreated as "normal", and got all corresponding events in the history, and if it keeps failing, it will become transient. I need to think if this should be changed in future.

yycptt · 2023-02-01T18:33:04Z

service/history/consts/const.go

@@ -92,6 +92,8 @@ var (
 	ErrWorkflowTaskNotScheduled = serviceerror.NewWorkflowNotReady("Workflow task is not scheduled yet.")
 	// ErrNamespaceHandover is error indicating namespace is in handover state and cannot process request.
 	ErrNamespaceHandover = common.ErrNamespaceHandover
+	// ErrWorkflowTaskStateInconsistent is error indicating workflow task state is inconsistent, for example there was no workflow task scheduled but buffered events are present.
+	ErrWorkflowTaskStateInconsistent = serviceerror.NewUnavailable("Workflow task state is inconsistent.")


Internal error maybe? I don't think retry can help in that inconsistent state?

I think retry could help. WT can complete, buffered events might get flushed and maybe something else. Would leave it as Unavailable for now.

yycptt · 2023-02-01T18:39:21Z

service/history/workflow/mutable_state_rebuilder.go

-			return nil, serviceerror.NewUnimplemented("Workflow Update rebuild not implemented")
+			// TODO (alex-update): Async workflow update might require update to be restored in registry from Accepted event.
+			//  Completed event will remove it from registry and notify update result pollers.
+			return nil, nil


Plz add a comment here says no change is needed on mutable state and no task needs to be generated for those events.

Is it possible that update related events are in different batches (e.g. one batch contains accepted and completed is in another batch)? If so, we need to handle the case where only some batches are replicated and then failover happens.

Updated comment, and will think about event batches later.

alexshtin force-pushed the feature/update-workflow-messages branch 2 times, most recently from 579b2cb to cf7bc20 Compare January 21, 2023 00:54

mmcshane approved these changes Jan 23, 2023

View reviewed changes

alexshtin force-pushed the feature/update-workflow-messages branch 3 times, most recently from 14e9758 to c60edb5 Compare January 25, 2023 18:44

alexshtin mentioned this pull request Jan 25, 2023

Messages protocol implementation #3843

Merged

alexshtin force-pushed the feature/update-workflow-messages branch 2 times, most recently from 4a0c1b7 to bf053fa Compare January 25, 2023 23:10

alexshtin marked this pull request as ready for review January 25, 2023 23:12

alexshtin requested a review from a team as a code owner January 25, 2023 23:12

alexshtin force-pushed the feature/update-workflow-messages branch 5 times, most recently from 7b32add to bd8d3ef Compare January 30, 2023 19:28

MichaelSnowden reviewed Jan 30, 2023

View reviewed changes

MichaelSnowden approved these changes Jan 30, 2023

View reviewed changes

yycptt reviewed Jan 31, 2023

View reviewed changes

alexshtin force-pushed the feature/update-workflow-messages branch from bd8d3ef to a53977d Compare February 1, 2023 02:27

yycptt approved these changes Feb 1, 2023

View reviewed changes

alexshtin force-pushed the feature/update-workflow-messages branch from a53977d to 238136a Compare February 1, 2023 20:15

alexshtin added 4 commits February 1, 2023 13:27

Sync workflow update using messages

9a70f29

Remove sequencing_id calculations

fb3fd17

Minor improvements

22e59a4

Address feedback

558a42d

alexshtin force-pushed the feature/update-workflow-messages branch from 238136a to 558a42d Compare February 1, 2023 21:27

alexshtin merged commit cbd73e0 into temporalio:master Feb 1, 2023

alexshtin deleted the feature/update-workflow-messages branch February 1, 2023 23:13

fourth-engineer mentioned this pull request Mar 22, 2023

Synchronous workflow update temporalio/sdk-java#1708

Closed

samanbarghi mentioned this pull request Apr 18, 2023

Add command for workflow update temporalio/cli#200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronous workflow update #3822

Synchronous workflow update #3822

alexshtin commented Jan 20, 2023 •

edited

Loading

mmcshane left a comment

mmcshane Jan 23, 2023

alexshtin Jan 24, 2023

mmcshane Jan 23, 2023

alexshtin Feb 1, 2023 •

edited

Loading

mmcshane Jan 23, 2023

alexshtin Feb 1, 2023

mmcshane Jan 23, 2023

mmcshane Jan 23, 2023

mmcshane Jan 23, 2023

alexshtin commented Jan 26, 2023

MichaelSnowden Jan 30, 2023

alexshtin Jan 30, 2023

MichaelSnowden Jan 30, 2023

alexshtin Jan 30, 2023

MichaelSnowden left a comment

alexshtin commented Jan 30, 2023

yycptt left a comment

yycptt Jan 31, 2023

alexshtin Feb 1, 2023

yycptt Jan 31, 2023

alexshtin Jan 31, 2023

alexshtin Feb 1, 2023

yycptt Feb 1, 2023

alexshtin Feb 1, 2023

yycptt Feb 1, 2023

alexshtin Feb 1, 2023

		// Always bypass task generation for speculative WT.
		if workflowTask.Type != enumsspb.WORKFLOW_TASK_TYPE_SPECULATIVE {

Synchronous workflow update #3822

Synchronous workflow update #3822

Conversation

alexshtin commented Jan 20, 2023 • edited Loading

mmcshane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexshtin Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexshtin commented Jan 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelSnowden left a comment

Choose a reason for hiding this comment

alexshtin commented Jan 30, 2023

yycptt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexshtin commented Jan 20, 2023 •

edited

Loading

alexshtin Feb 1, 2023 •

edited

Loading