Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: fix potential memory leak in getDirectQueryConnections with node 20 #9575

Merged
merged 2 commits into from
Mar 20, 2025

Conversation

zhongnansu
Copy link
Member

@zhongnansu zhongnansu commented Mar 19, 2025

Description

fix potential memory leak in getDirectQueryConnections with node 20

  1. enforce timeout to 5s for the legacy client call on ppl.dataSource
  2. use larger default client pool size 10. Note that if single page is loading with 10+ datasources, this could still lead to server crash due to a deprecated parseUrl function in url is used in elasticsearch legacy library. However, to address this, we will need to deprecate usage of elasticsearch client, which is not currently in scope of 3.0 release.
(node:504350) [DEP0170] DeprecationWarning: The URL search-data-logs-2-xrnwkouy6zb62xpw5vaarfndm4.aos.us-west-2.on.aws:443::::::::true::::::::::::: is invalid. Future versions of Node.js will throw an error.
(Use node --trace-deprecation ... to show where the warning was created)
Node.js process-warning detected:

DeprecationWarning: The URL search-data-logs-2-xrnwkouy6zb62xpw5vaarfndm4.aos.us-west-2.on.aws:443::::::::true::::::::::::: is invalid. Future versions of Node.js will throw an error.
    at getHostname (node:url:517:17)
    at Url.parse (node:url:385:14)
    at urlParse (node:url:142:13)
    at /home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/connectors/http.js:78:34
    at arrayEach (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/lodash/lodash.js:530:11)
    at Function.forEach (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/lodash/lodash.js:9410:14)
    at HttpConnector.onStatusSet (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/connectors/http.js:75:7)
    at HttpConnector.wrapper (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/lodash/lodash.js:4991:19)
    at HttpConnector.emit (node:events:530:35)
    at HttpConnector.emit (node:domain:489:12)
    at HttpConnector.ConnectionAbstract.setStatus (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/connection.js:102:8)
    at ConnectionPool.removeConnection (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/connection_pool.js:325:16)
    at ConnectionPool.setHosts (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/connection_pool.js:357:10)
    at ConnectionPool.close (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/connection_pool.js:371:8)
    at Transport.close (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/transport.js:521:23)
    at EsApiClient.close (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/elasticsearch/src/lib/client.js:57:22)
    at LRUCache.dispose (/home/ubuntu/Projects/OpenSearch-Dashboards/src/plugins/data_source/server/client/client_pool.ts:48:24)
    at del (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/lru-cache/index.js:453:20)
    at trim (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/lru-cache/index.js:443:7)
    at LRUCache.set (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/lru-cache/index.js:343:3)
    at addClientToPool (/home/ubuntu/Projects/OpenSearch-Dashboards/src/plugins/data_source/server/client/client_pool.ts:85:31)
    at getQueryClient (/home/ubuntu/Projects/OpenSearch-Dashboards/src/plugins/data_source/server/legacy/configure_legacy_client.ts:158:7)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at configureLegacyClient (/home/ubuntu/Projects/OpenSearch-Dashboards/src/plugins/data_source/server/legacy/configure_legacy_client.ts:75:12)
    at /home/ubuntu/Projects/OpenSearch-Dashboards/src/plugins/data_source_management/server/routes/data_connections_router.ts:225:37
    at Router.handle (/home/ubuntu/Projects/OpenSearch-Dashboards/src/core/server/http/router/router.ts:286:44)
    at handler (/home/ubuntu/Projects/OpenSearch-Dashboards/src/core/server/http/router/router.ts:241:11)
    at exports.Manager.execute (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/home/ubuntu/Projects/OpenSearch-Dashboards/node_modules/@hapi/hapi/lib/request.js:281:9)

Terminating process...
 server crashed  with status code 1

Issues Resolved

#9459

Screenshot

image

Testing the changes

Changelog

  • fix: fix potential memory leak in getDirectQueryConnections

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Update CHANGELOG.md
  • Commits are signed per the DCO using --signoff

Copy link
Contributor

❌ Empty Changelog Section

The Changelog section in your PR description is empty. Please add a valid changelog entry or entries. If you did add a changelog entry, check to make sure that it was not accidentally included inside the comment block in the Changelog section.

opensearch-changeset-bot bot added a commit to zhongnansu/OpenSearch-Dashboards that referenced this pull request Mar 19, 2025
@zhongnansu zhongnansu added the nodejs 🍭 issues related to nodejs client label Mar 19, 2025
@zhongnansu zhongnansu changed the title Fix: fix potential memory leak in getDirectQueryConnections Fix: fix potential memory leak in getDirectQueryConnections with node 20 Mar 19, 2025
@zhongnansu zhongnansu marked this pull request as ready for review March 19, 2025 22:25
Copy link

codecov bot commented Mar 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.81%. Comparing base (0433807) to head (ee1fd94).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9575      +/-   ##
==========================================
- Coverage   61.82%   61.81%   -0.02%     
==========================================
  Files        3825     3825              
  Lines       92058    92058              
  Branches    14602    14602              
==========================================
- Hits        56916    56902      -14     
- Misses      31469    31482      +13     
- Partials     3673     3674       +1     
Flag Coverage Δ
Linux_1 28.94% <ø> (ø)
Linux_2 56.38% <ø> (ø)
Linux_3 39.42% <ø> (+<0.01%) ⬆️
Linux_4 28.84% <ø> (ø)
Windows_1 28.96% <ø> (-0.02%) ⬇️
Windows_2 56.33% <ø> (ø)
Windows_3 39.42% <ø> (ø)
Windows_4 28.84% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

for await (const datasource of dataSources) {
await getDirectQueryConnections(datasource.id, http!)
.then((connections) => directQueryConnections.push(...connections))
.catch(() => directQueryConnections.push([]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to swallow exceptions here?


for await (const datasource of dataSources) {
await getDirectQueryConnections(datasource.id, http!)
.then((connections) => directQueryConnections.push(...connections))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this removes the parallel execution, would it cause perf issues?

@@ -222,7 +222,9 @@ export function registerDataConnectionsRoute(router: IRouter, dataSourceEnabled:
let dataConnectionsresponse;
if (dataSourceEnabled && dataSourceMDSId) {
const client = await context.dataSource.opensearch.legacy.getClient(dataSourceMDSId);
dataConnectionsresponse = await client.callAPI('ppl.getDataConnections');
dataConnectionsresponse = await client.callAPI('ppl.getDataConnections', {
requestTimeout: 2000, // Enforce timeout to avoid hanging requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a normal opensearch call, do you know why it's timing out? is it when cluster is deleted or not reachable but the MDS reference still exists? also 2 seconds might be too small

const directQueryConnections = directQueryConnectionsResult.flat();
const directQueryConnections: DataSourceTableItem[] = [];

for await (const datasource of dataSources) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call out: are we not changing this from parallel requests to synchronously making requests? Does this impact the experience at all?

for await (const datasource of dataSources) {
await getDirectQueryConnections(datasource.id, http!)
.then((connections) => directQueryConnections.push(...connections))
.catch(() => directQueryConnections.push([]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before, empty arrays were flattened out - is this handled downstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I just realized by applying correct timeout alone can fix the issue. If Promise timeout faster, it won't pile up memory with Promise.all and cause memory issue. I just reverted the change to keep original Promise.all() for sake of performance. Also, 5s timeout for the specific calls on DSM table page is reasonable, as it's just a call to verify if the endpoint is reachable, no need to wait for the default 30s timeout to call with a single datasource endpoint
cc: @joshuali925 @huyaboo

Signed-off-by: Zhongnan Su <szhongna@amazon.com>
joshuali925
joshuali925 previously approved these changes Mar 19, 2025
@@ -41,7 +41,7 @@ export const configSchema = schema.object({
),
}),
clientPool: schema.object({
size: schema.number({ defaultValue: 5 }),
size: schema.number({ defaultValue: 10 }),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this value means we keep 10 active connections to opensearch cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at most 10, it has mechanism to clean up stale ones

@zhongnansu zhongnansu merged commit 14813de into opensearch-project:main Mar 20, 2025
73 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants