Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite JS Transformations to use Maps instead of Proxy Objects #1337

Merged

Conversation

AndreKurait
Copy link
Member

@AndreKurait AndreKurait commented Mar 10, 2025

Description

Improves performance and usability of Javascript Transforms by rewriting JS Transformations to use Maps instead of Proxy Objects.

This alone results in reduced performance hit with the Type Mapping Sanitization Transformer for RFS from 43% -> 18%.

Furthermore implemented batching with RFS transformations allowing RFS to send up to 100 documents at a time to downstream transformations which further reduced the performance hit for the Type Mapping Sanitization Transformer from 18% -> ~9%. To support this with existing non-batch-supporting transforms, added FlatteningJsonArrayTransformer which can wrap an existing transform to enable it to support the RFS batches.

In commit demonstrated unit testing javascript code with the type conversion from java -> javascript.

Issues Resolved

MIGRATIONS-2392

Testing

Added unit tests, ran in the cloud to verify overall performance improvements.

image

Check List

  • New functionality includes testing
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Mar 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.19%. Comparing base (56bf099) to head (fd5e72b).
Report is 58 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1337      +/-   ##
============================================
+ Coverage     79.46%   80.19%   +0.72%     
+ Complexity     3263     2979     -284     
============================================
  Files           468      440      -28     
  Lines         16934    16184     -750     
  Branches       1115     1078      -37     
============================================
- Hits          13457    12978     -479     
+ Misses         2805     2570     -235     
+ Partials        672      636      -36     
Flag Coverage Δ
unittests 80.19% <ø> (+0.72%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AndreKurait AndreKurait changed the title Transformation performance Rewrite JS Transformations to use Maps over Proxy Objects Mar 12, 2025
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
…k transforms with existing transformations

Signed-off-by: Andre Kurait <akurait@amazon.com>
@AndreKurait AndreKurait force-pushed the TransformationPerformance branch from ce8b943 to 10d6c2d Compare March 12, 2025 15:15
Signed-off-by: Andre Kurait <akurait@amazon.com>
… converter

Signed-off-by: Andre Kurait <akurait@amazon.com>
@AndreKurait AndreKurait force-pushed the TransformationPerformance branch from bffc95a to 322bce2 Compare March 12, 2025 15:45
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
@@ -20,6 +20,19 @@ function isEnabled(features, path) {
return false;
}

function retargetCommandParameters(parameters, targetIndex) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved in order to show up in commit history to demonstrate unit testing capabilities

Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
@AndreKurait AndreKurait force-pushed the TransformationPerformance branch from 81a45ab to 2c10475 Compare March 12, 2025 22:31
@AndreKurait AndreKurait changed the title Rewrite JS Transformations to use Maps over Proxy Objects Rewrite JS Transformations to use Maps instead of Proxy Objects Mar 14, 2025
@AndreKurait AndreKurait force-pushed the TransformationPerformance branch from 2c10475 to 88433d7 Compare March 14, 2025 17:04
@@ -34,7 +34,8 @@ public Flux<WorkItemCursor> reindex(String indexName, Flux<RfsLuceneDocument> do
var scheduler = Schedulers.newParallel("DocumentBulkAggregator");
var rfsDocs = documentStream
.publishOn(scheduler, 1)
.concatMapIterable(doc -> transformDocument(threadSafeTransformer, doc, indexName));
.buffer(Math.min(100, maxDocsPerBulkRequest)) // arbitrary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be as big as the batch that we send out over the wire?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, particularly for transformations that dramatically increase or decrease the size of the docs. Also, at this point we don't know how many documents we'll include over the wire, as most the time we construct it by a size instead of number

Comment on lines +77 to +81
return new Map([
["method", "GET"],
["URI", "/"],
["protocol", "HTTP/1.0"]
]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we can send in a more constrained or different value, but should we prevent a user from being able to construct what that was? I'd suspect that this degree of friction will cause irritation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following this comment regarding this function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why this changed. Sounds like this format or the prior one will work for customers.

Signed-off-by: Andre Kurait <akurait@amazon.com>
@AndreKurait AndreKurait force-pushed the TransformationPerformance branch from 88433d7 to fac4405 Compare March 18, 2025 17:23
Signed-off-by: Andre Kurait <akurait@amazon.com>
" }; " +
"}" +
"main";
private static final String INIT_SCRIPT_2 = "\n" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could just be ""+ too. It might be nice to put newlines within the string (like before 'main'). I will look at these const strings when debugging.

Comment on lines +77 to +81
return new Map([
["method", "GET"],
["URI", "/"],
["protocol", "HTTP/1.0"]
]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why this changed. Sounds like this format or the prior one will work for customers.

Signed-off-by: Andre Kurait <akurait@amazon.com>
@AndreKurait AndreKurait merged commit f913813 into opensearch-project:main Mar 20, 2025
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants