Eager workflow dispatch #3835

bergundy · 2023-01-25T01:22:56Z

Implement eager workflow dispatch for StartWorkflowExecution.

I've only added a single unit test to the repo, the rest are in this PR in the features repo.
Added a counter metric workflow_eager_execution to count the number of eager execution requests per namespace + task queue.

I've added a TODO to add support for eager signal with start, I figured that can be added later.

service/history/api/create_workflow_util.go

service/history/api/startworkflow/api.go

…atch

service/history/api/create_workflow_util.go

service/history/api/startworkflow/api.go

service/history/api/create_workflow_util.go

yycptt · 2023-01-27T08:16:54Z

service/history/api/startworkflow/api.go

+	// The current workflow task is not inflight or not the first task or we exceeded the first attempt and fell back to
+	// matching based dispatch.
+	if !mutableStateInfo.hasInflight || mutableStateInfo.workflowTaskInfo.StartedEventID != 3 || mutableStateInfo.workflowTaskInfo.Attempt > 1 {
+		return nil, serviceerror.NewWorkflowExecutionAlreadyStarted(


Why return error here? Start workflow is still a success.

Because we cannot return an eager task.
Caller should be notified with an error and handle this case.
I think it's clearer than returning nil task.

I'm on the fence whether this should result in an error or not.
In any case what the caller should do is get a handle to the workflow and wait for its completion.
With the error approach there's at least a way to let the caller know what happened.

@Spikhalskiy @cretz I could use your opinion here

Personally, I think it's clearer/safer to consider eagerness to be a request of the server, not a requirement. The server is allowed to use whatever heuristics it wants to deny the request and do a non-eager and it still be a successful task. The absence of a task in the response I think is enough to tell the caller the server denied the request.

I understand your use case is a "require to be eager" but that use case isn't supported for activities either. If we want a require-to-be-eager to be a thing, it should be a thing on both. I could support a server-side/namespace option of "fail if eager requested but cannot be given".

(my opinion is not super strong here...if we agree that eager workflow tasks are a server requirement not a request, we can just doc clearly and then error if it cannot be granted)

Hrmm...k.

So if I call this start, may a workflow start because of it? If so, IMO that should never error even if something post-start can't happen. To me success means "workflow started because of this request" and failure means "workflow did not start because of this request". Unsure if that's related. Sorry, not familiar w/ details here so I don't have a big opinion.

If a client gets this error it means that a retried request came in too late and the workflow task cannot be dispatched eagerly (likely due to it already being dispatched via the standard, matching based, path).

The only thing an error gives you over the alternative where the server responds successfully but omits the inline task is that additional piece of information.

Simply - if success means workflow started by this request and failure means workflow not started by request, works for me. If failure can mean workflow still started by this request, that's confusing to me.

Here the workflow was started from a previous attempt of the same request. So it seems like you're saying it should not be an error.

I tend to agree but I will add a log to avoid losing some of this information.

Ended up returning successful result and recording a metric saying eager execution was denied with a reason tag.

service/history/api/startworkflow/api.go

…atch

config/dynamicconfig/development-sql.yaml

service/history/api/startworkflow/api.go

yycptt · 2023-01-31T22:20:36Z

service/history/api/startworkflow/api.go

+			TaskToken:                  serializedToken,
+			WorkflowExecution:          &commonpb.WorkflowExecution{WorkflowId: workflowID, RunId: runID},
+			WorkflowType:               request.GetWorkflowType(),
+			PreviousStartedEventId:     0,


nit: ideally we should get the value from mutable state.

service/history/api/startworkflow/api.go

yux0 · 2023-01-31T23:56:54Z

service/history/api/create_workflow_util.go

 		return nil, err
 	}

+	// If first workflow task should back off (e.g. cron or workflow retry) a workflow task will not be scheduled.
+	if requestEagerExecution && scheduledEventID != 0 {


nit: for readability use newMutableState.HasPendingWorkflowTask()?

I'm not sure I find it more readable but I'm fine with making this change

…atch

service/history/api/startworkflow/api.go

…atch

MichaelSnowden

Could we abstract an ExecutionStrategy interface here in a separate PR to keep things SOLID and make the review easier? The code looks good to me, but it's hard to review with all of the modifications to the existing code.

MichaelSnowden · 2023-02-01T21:38:39Z

common/metrics/tags.go

+
+// ReasonTag is a generic tag can be used anywhere a reason is needed.
+// Make sure that the value is of limited cardinality.
+func ReasonTag(value string) Tag {


Let's take in an opaque enum type here to prevent misuse. I think the safety benefit outweighs the cost of the tediousness involved

I'm not sure I follow, do you mean defining a type like:

// ReasonString is just a string but used to remind anyone using ReasonTag to limit the cardinality of the possible reasons. type ReasonString string

If that's the case, I see little benefit to that over documenting the ask to limit the cardinality of values.
I already have a custom string enum type where this is used.

I ended up adding this.

MichaelSnowden · 2023-02-01T21:41:54Z

service/history/api/create_workflow_util.go

@@ -99,16 +99,32 @@ func NewWorkflowWithSignal(
 			return nil, err
 		}
 	}
-
+	requestEagerExecution := startRequest.StartRequest.GetRequestEagerExecution()


This is the first usage of startRequest.StartRequest that I see in this function, so I'm worried about potential NPEs. How do we know this is always non-nil?

We don't, we count on this request to be validated before this method is called.

MichaelSnowden · 2023-02-01T21:49:05Z

service/history/workflow/mutable_state.go

@@ -109,7 +109,7 @@ type (
 		AddWorkflowTaskCompletedEvent(int64, int64, *workflowservice.RespondWorkflowTaskCompletedRequest, int) (*historypb.HistoryEvent, error)
 		AddWorkflowTaskFailedEvent(scheduledEventID int64, startedEventID int64, cause enumspb.WorkflowTaskFailedCause, failure *failurepb.Failure, identity, binChecksum, baseRunID, newRunID string, forkEventVersion int64) (*historypb.HistoryEvent, error)
 		AddWorkflowTaskScheduleToStartTimeoutEvent(int64) (*historypb.HistoryEvent, error)
-		AddFirstWorkflowTaskScheduled(*historypb.HistoryEvent) error
+		AddFirstWorkflowTaskScheduled(event *historypb.HistoryEvent, bypassTaskGeneration bool) (int64, error)


I'd split this into two separate methods instead of adding a flag argument

The interface already has way too many methods IMHO.

MichaelSnowden · 2023-02-01T21:57:02Z

service/history/api/startworkflow/api.go

+}
+
+// prepare applies request overrides, validates the request, and records eager execution metrics.
+func (s *Starter) prepare(ctx context.Context) error {


Consider parsing the proto into a new StartRequest type instead of modifying the request: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

Is this something that is commonly don't in Go?
If this was JS or a functional language or a language that has the concept of immutable data I would totally clone the request.

I'm thinking of the same pattern we use for tasks here: https://github.com/temporalio/temporal/blob/66db3aebeead81869dcc864e0f174063b167ee10/common/persistence/serialization/task_serializer.go

Basically taking the proto and parsing it into a plain Go object with the fields already validated and parsed into more structured types.

I conceptually agree here but since the raw protos are used in the internal APIs, parsing does not make sense in this case.

MichaelSnowden · 2023-02-01T21:59:33Z

service/history/api/startworkflow/api.go

+
+// creationContext is a container for all information obtained from creating the uncommitted execution.
+// The information is later used to create a new execution and handle conflicts.
+type creationContext struct {


nit: I'd rename this to creationParams to avoid having another somethingCtx param floating around because devs will be unsure whether it embeds an actual Context or not

MichaelSnowden · 2023-02-01T22:01:45Z

service/history/api/startworkflow/api.go

+	metricsHandler.Counter(metrics.WorkflowEagerExecutionDeniedCounter.GetMetricName()).Record(
+		1,
+		metrics.NamespaceTag(s.namespace.Name().String()),
+		metrics.TaskQueueTag(s.request.StartRequest.TaskQueue.Name),


This is another place where using a parsed type instead of the raw StartRequest is better because it prevents Law of Demeter violations here and elsewhere

MichaelSnowden · 2023-02-01T22:03:55Z

service/history/api/startworkflow/api.go

+	if err == nil {
+		return s.generateResponse(creationCtx.runID, creationCtx.workflowTaskInfo, extractHistoryEvents(creationCtx.workflowEventBatches))
+	}
+	t, ok := err.(*persistence.CurrentWorkflowConditionFailedError)


Please use Errors.As

I copied this from the original implementation but yes, As is better here.

MichaelSnowden · 2023-02-01T22:09:28Z

service/history/api/startworkflow/api.go

+	// The history and mutable state we generated above should be deleted by a background process.
+	return s.handleConflict(ctx, creationCtx, t)


What happens if we crash before reaching this line?

I don't understand the question

For this comment:

// The history and mutable state we generated above should be deleted by a background process.

Is handleConflict the method that deletes them?

alexshtin

WT state machine changes are not in conflict with workflow update changes.

alexshtin · 2023-02-01T21:35:31Z

service/history/workflow/mutable_state_impl.go

 	opTag := tag.WorkflowActionWorkflowTaskScheduled
 	if err := ms.checkMutability(opTag); err != nil {
-		return err
+		return 0, err


There is common.EmptyEventID (which is 0) and it fits perfectly here.

alexshtin · 2023-02-01T22:47:23Z

service/history/handler.go

 	if err != nil {
 		return nil, h.convertError(err)
 	}


This error check needs to be inside if block.

It probably doesn't matter much but you're right.

…atch

Eager workflow dispatch

ea6b281

bergundy self-assigned this Jan 25, 2023

bergundy requested a review from a team as a code owner January 25, 2023 01:22

bergundy commented Jan 25, 2023

View reviewed changes

service/history/api/create_workflow_util.go Outdated Show resolved Hide resolved

bergundy commented Jan 25, 2023

View reviewed changes

service/history/api/create_workflow_util.go Outdated Show resolved Hide resolved

bergundy added 2 commits January 24, 2023 17:39

Revert returning workflowTaskInfo from NewWorkflowWithSignal

5f23b7c

Fix lint issues

bb221c7

bergundy commented Jan 25, 2023

View reviewed changes

service/history/api/startworkflow/api.go Outdated Show resolved Hide resolved

bergundy commented Jan 25, 2023

View reviewed changes

service/history/api/startworkflow/api.go Outdated Show resolved Hide resolved

bergundy and others added 12 commits January 25, 2023 11:15

Minor restructuring

c3a4cd9

Support eager start with TERMINATE_IF_RUNNING

e8b8fba

Properly release workflow context

fa4dccf

Add documentation and restructure for better readability

4e49875

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

38f628e

…atch

Fix lint

fc50446

Run go-generate

15158f9

Fix nolint directive

cc64258

More restructuring, get rid of cyclo complexity

06aebdc

Fix task inflight condition

f32e716

Merge branch 'master' into eager-workflow-dispatch

bb00ebe

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

6394cbb

…atch

yux0 reviewed Jan 27, 2023

View reviewed changes

service/history/api/create_workflow_util.go Show resolved Hide resolved

yycptt reviewed Jan 27, 2023

View reviewed changes

yycptt requested review from yiminc and alexshtin January 27, 2023 08:51

bergundy and others added 2 commits January 27, 2023 10:26

Merge branch 'master' into eager-workflow-dispatch

8dba7ff

Address review comments

593a669

bergundy force-pushed the eager-workflow-dispatch branch from 1e7342a to 593a669 Compare January 28, 2023 01:56

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

a01484b

…atch

bergundy commented Jan 28, 2023

View reviewed changes

config/dynamicconfig/development-sql.yaml Show resolved Hide resolved

yycptt reviewed Jan 31, 2023

View reviewed changes

yycptt approved these changes Jan 31, 2023

View reviewed changes

yux0 reviewed Jan 31, 2023

View reviewed changes

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

7dae88e

…atch

bergundy force-pushed the eager-workflow-dispatch branch from 49d58ce to 5b10c6c Compare February 1, 2023 01:14

Address review comments

9a3a16e

bergundy force-pushed the eager-workflow-dispatch branch from 5b10c6c to 9a3a16e Compare February 1, 2023 01:16

bergundy enabled auto-merge (squash) February 1, 2023 01:18

yycptt reviewed Feb 1, 2023

View reviewed changes

service/history/api/startworkflow/api.go Show resolved Hide resolved

service/history/api/startworkflow/api.go Show resolved Hide resolved

bergundy added 4 commits January 31, 2023 19:59

Fix missing reason tag value

41296fc

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

65f9726

…atch

Add missing nil check

efece49

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

f23a706

…atch

MichaelSnowden reviewed Feb 1, 2023

View reviewed changes

alexshtin reviewed Feb 1, 2023

View reviewed changes

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

ab77d10

…atch

bergundy disabled auto-merge February 2, 2023 00:14

Address review comments

e0a69c2

bergundy force-pushed the eager-workflow-dispatch branch from 16445ad to e0a69c2 Compare February 2, 2023 00:54

Merge remote-tracking branch 'origin/master' into eager-workflow-disp…

efd06d7

…atch

bergundy enabled auto-merge (squash) February 2, 2023 01:06

bergundy merged commit 6ef7749 into temporalio:master Feb 2, 2023

bergundy deleted the eager-workflow-dispatch branch February 2, 2023 01:35

Spikhalskiy mentioned this pull request Mar 7, 2023

[Feature Request] Eager Workflow Task Dispatch on SDKs temporalio/features#242

Open

6 tasks

		// The history and mutable state we generated above should be deleted by a background process.
		return s.handleConflict(ctx, creationCtx, t)

Eager workflow dispatch #3835

Eager workflow dispatch #3835

Conversation

bergundy commented Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz Jan 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bergundy Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelSnowden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexshtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bergundy commented Jan 25, 2023 •

edited

Loading

cretz Jan 30, 2023 •

edited

Loading

bergundy Feb 1, 2023 •

edited

Loading