Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating to ray2 #1982

Merged
merged 14 commits into from
Nov 28, 2022
Merged

Updating to ray2 #1982

merged 14 commits into from
Nov 28, 2022

Conversation

joshua-cogliati-inl
Copy link
Contributor

@joshua-cogliati-inl joshua-cogliati-inl commented Sep 29, 2022


Pull Request Description

What issue does this change request address?

#1972

What are the significant changes in functionality due to this change request?

Updates ray to version 2
Always pass PYTHONPATH to ray init.


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

@moosebuild
Copy link

Job Precheck on 743a8f0 : invalidated by @joshua-cogliati-inl

failed in fetch

1 similar comment
@moosebuild
Copy link

Job Precheck on 743a8f0 : invalidated by @joshua-cogliati-inl

failed in fetch

@moosebuild
Copy link

Job Precheck on 743a8f0 : invalidated by @milljm

@moosebuild
Copy link

Job Precheck on 743a8f0 : invalidated by @joshua-cogliati-inl

failed in fetch

@moosebuild
Copy link

Job Test qsubs sawtooth on ca0493d : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/cluster_tests/AdaptiveSobol/test_parallel_adaptive_sobol

@joshua-cogliati-inl
Copy link
Contributor Author

Hm, mac failed with:

(    0.14 sec) Job Handler              : DEBUG           -> Initializing ray locally with num_cpus:  4
2022-09-30 14:55:25,799	ERROR node.py:742 -- Unable to succeed in selecting a random port.
Traceback (most recent call last):
  File "/Users/civet/civet/build_0/raven/raven_framework.py", line 26, in <module>
    sys.exit(main(True))
  File "/Users/civet/civet/build_0/raven/ravenframework/Driver.py", line 203, in main
    raven()
  File "/Users/civet/civet/build_0/raven/ravenframework/Driver.py", line 155, in raven
    simulation.initialize()
  File "/Users/civet/civet/build_0/raven/ravenframework/Simulation.py", line 543, in initialize
    self.jobHandler.initialize()
  File "/Users/civet/civet/build_0/raven/ravenframework/JobHandler.py", line 140, in initialize
    self.__initializeRay()
  File "/Users/civet/civet/build_0/raven/ravenframework/JobHandler.py", line 224, in __initializeRay
    self.rayServer = ray.init(num_cpus=int(self.runInfoDict['totalNumCoresUsed']),include_dashboard=db) if _rayAvail else \
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/worker.py", line 1420, in init
    _global_node = ray._private.node.Node(
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/node.py", line 267, in __init__
    self._ray_params.update_pre_selected_port()
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/parameter.py", line 326, in update_pre_selected_port
    raise ValueError(
ValueError: Ray component dashboard_agent_grpc is trying to use a port number 65534 that is used by other components.
Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 65534, 'client_server': 'random', 'dashboard': 'random', 'dashboard_agent_grpc': 65534, 'dashboard_agent_http': 52365, 'metrics_export': 65535, 'redis_shards': 'random', 'worker_ports': 'random'}
If you allocate ports, please make sure the same port is not used by multiple components.

Running test failed with exit code -15
(678F1/819) Failed (  7.69sec)tests/framework/InternalParallelTests/ROMscikit

@moosebuild
Copy link

Job Test mac on ca0493d : invalidated by @joshua-cogliati-inl

ValueError: Ray component dashboard_agent_grpc is trying to use a port number 65534 that is used by other components.

@moosebuild
Copy link

Job Test Ubuntu 16 on 743a8f0 : invalidated by @joshua-cogliati-inl

This is a \test\for\Everything

@moosebuild
Copy link

Job Test Ubuntu 16 on 743a8f0 : canceled by @joshua-cogliati-inl

f: \ stuff

@moosebuild
Copy link

Job Test Ubuntu 16 on 743a8f0 : invalidated by @joshua-cogliati-inl

Restarting.

@moosebuild
Copy link

Job Test Ubuntu 16 on ca0493d : invalidated by @joshua-cogliati-inl

restarted civet

@wangcj05
Copy link
Collaborator

wangcj05 commented Oct 5, 2022

Windows failed.

@moosebuild
Copy link

Job Test Fedora 31 on 6bbda33 : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/framework/PostProcessors/EconomicRatio/timeDepDataset

@moosebuild
Copy link

Job Mingw Test on 6bbda33 : invalidated by @wangcj05

testing

@wangcj05 wangcj05 added RAVENv2.2 for RAVENv2.2 Release RAVENv2.3 for RAVEN 2.3 Release and removed RAVENv2.2 for RAVENv2.2 Release labels Nov 16, 2022
@moosebuild
Copy link

Job Test Ubuntu 18 PIP on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

1 similar comment
@moosebuild
Copy link

Job Test Ubuntu 18 PIP on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

@moosebuild
Copy link

Job Test CentOS 8 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

@moosebuild
Copy link

Job Test Fedora 31 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

@moosebuild
Copy link

Job Test Fedora 32 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

@moosebuild
Copy link

Job Test Ubuntu 16 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

@moosebuild
Copy link

Job Test Ubuntu 18-2 Python 3 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

@moosebuild
Copy link

Job Test Ubuntu 20-2 Optional on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

@moosebuild
Copy link

Job Test Fedora 31 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

@moosebuild
Copy link

Job Test Fedora 32 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

@moosebuild
Copy link

Job Test Ubuntu 16 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

@moosebuild
Copy link

Job Test Ubuntu 20-2 Optional on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

@moosebuild
Copy link

Job Test Ubuntu 18-2 Python 3 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

Copy link
Collaborator

@wangcj05 wangcj05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good.

@wangcj05
Copy link
Collaborator

checklist is good, and tests are green. PR can be merged.

@wangcj05 wangcj05 merged commit ea9c8a9 into idaholab:devel Nov 28, 2022
@joshua-cogliati-inl joshua-cogliati-inl deleted the update_to_ray2 branch November 28, 2022 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RAVENv2.3 for RAVEN 2.3 Release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants