Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEOS-GCM on Discover with GNU - segfault in yafyaml #985

Closed
climbfuji opened this issue Feb 5, 2024 · 16 comments
Closed

GEOS-GCM on Discover with GNU - segfault in yafyaml #985

climbfuji opened this issue Feb 5, 2024 · 16 comments
Assignees
Labels
bug Something is not working INFRA JEDI Infrastructure

Comments

@climbfuji
Copy link
Collaborator

Describe the bug
I can build and run GEOS-GCM with Intel 2021.5.0 on Discover, but with GNU 10.1.0 for the exact same set of libraries, I get the following error:

/gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/GEOSgcm/install-gnu/bin/esma_mpirun: mpi_type = openmpi
/discover/swdev/jcsda/spack-stack/openmpi-4.1.3/gcc-10.1.0/bin/mpirun  -np 24 /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/GEOSgcm/experiments/test-c12-20240204-gnu/scratch/GEOSgcm.x --logging_config logging.yaml
At line 36 of file /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/cache/build_stage/spack-stage-yafyaml-1.2.0-c5noas5hs72npkhl2bd7ldfka3covowp/spack-src/src/Nodes/Mapping.F90
Fortran runtime error: Recursive call to nonrecursive procedure '__copy_fy_mapping_Omap_i_s_node'

Error termination. Backtrace:
At line 36 of file /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/cache/build_stage/spack-stage-yafyaml-1.2.0-c5noas5hs72npkhl2bd7ldfka3covowp/spack-src/src/Nodes/Mapping.F90
Fortran runtime error: Recursive call to nonrecursive procedure '__copy_fy_mapping_Omap_i_s_node'

Error termination. Backtrace:
At line 36 of file /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/cache/build_stage/spack-stage-yafyaml-1.2.0-c5noas5hs72npkhl2bd7ldfka3covowp/spack-src/src/Nodes/Mapping.F90
Fortran runtime error: Recursive call to nonrecursive procedure '__copy_fy_mapping_Omap_i_s_node'

Error termination. Backtrace:
At line 36 of file /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/cache/build_stage/spack-stage-yafyaml-1.2.0-c5noas5hs72npkhl2bd7ldfka3covowp/spack-src/src/Nodes/Mapping.F90
Fortran runtime error: Recursive call to nonrecursive procedure '__copy_fy_mapping_Omap_i_s_node'

Error termination. Backtrace:
At line 36 of file /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/cache/build_stage/spack-stage-yafyaml-1.2.0-c5noas5hs72npkhl2bd7ldfka3covowp/spack-src/src/Nodes/Mapping.F90
Fortran runtime error: Recursive call to nonrecursive procedure '__copy_fy_mapping_Omap_i_s_node'

Error termination. Backtrace:
At line 36 of file /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/cache/build_stage/spack-stage-yafyaml-1.2.0-c5noas5hs72npkhl2bd7ldfka3covowp/spack-src/src/Nodes/Mapping.F90
Fortran runtime error: Recursive call to nonrecursive procedure '__copy_fy_mapping_Omap_i_s_node'

Error termination. Backtrace:
At line 36 of file /discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/cache/build_stage/spack-stage-yafyaml-1.2.0-c5noas5hs72npkhl2bd7ldfka3covowp/spack-src/src/Nodes/Mapping.F90
Fortran runtime error: Recursive call to nonrecursive procedure '__copy_fy_mapping_Omap_i_s_node'

To Reproduce
Build spack-stack develop on Discover for GNU, compile GEOS-GCM, follow instructions to set up GEOS-GCM (gcm_setup and makeoneday.bash), submit job.

Expected behavior
No error - GNU runs to completion like Intel does.

System:
Discover with GNU

Additional context
n/a

@climbfuji climbfuji added the bug Something is not working label Feb 5, 2024
@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 5, 2024

@climbfuji Hmm. I interesting. This might be a "too old GNU". We of course use GCC 12.1 as our "operational" GCC on the SLES12 machines and we do not test with anything else.

But you might want to try a newer GCC 10 if you want to use that. Maybe GCC 10.3? In the back of my mind I want to say some user had issues with GFE and it was fixed in later 10 variants.

I'll also mention @tclune so he can weigh in.

@climbfuji
Copy link
Collaborator Author

dheinzel@discover11:/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/GEOSgcm/experiments/test-c12-20240204-gnu> ldd GEOSgcm.x
	linux-vdso.so.1 (0x00007ffd5eafd000)
	/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/esmf-8.6.0-sai32vs/lib/libesmf.so (0x00007f6580029000)
	libGEOS_CatchCNShared.so => not found
	libGEOS_LandShared.so => not found
	libGEOS_SurfaceShared.so => not found
	libopenblas.so.0 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/openblas-0.3.24-l2whjb5/lib/libopenblas.so.0 (0x00007f657f419000)
	libgfortran.so.5 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/gcc-runtime-10.1.0-jyojh2v/lib/libgfortran.so.5 (0x00007f657ef64000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f657ec67000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f657ea63000)
	libgomp.so.1 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/gcc-runtime-10.1.0-jyojh2v/lib/libgomp.so.1 (0x00007f657e824000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f657e607000)
	libCICE_GEOSPlug.so => not found
	libcice6.so => not found
	libCICE4.so => not found
	libMOM6_GEOSPlug.so => not found
	libmom6.so => not found
	libMOM_GEOS5PlugMod.so => not found
	libMAPL.so => not found
	libMAPL.gridcomps.so => not found
	libMAPL.cap.so => not found
	libMAPL.history.so => not found
	libMAPL.ExtData.so => not found
	libMAPL.ExtData2G.so => not found
	libMAPL.orbit.so => not found
	libMAPL.generic.so => not found
	libMAPL.oomph.so => not found
	libMAPL.griddedio.so => not found
	libMAPL.base.so => not found
	libMAPL.pfio.so => not found
	libMAPL.profiler.so => not found
	libMAPL_cfio_r4.so => not found
	libMAPL.field_utils.so => not found
	librt.so.1 => /lib64/librt.so.1 (0x00007f657e3ff000)
	libstdc++.so.6 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/gcc-runtime-10.1.0-jyojh2v/lib/libstdc++.so.6 (0x00007f657e02c000)
	libnetcdf.so.19 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/netcdf-c-4.9.2-ep2oztd/lib/libnetcdf.so.19 (0x00007f657de20000)
	libnetcdff.so.7 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/netcdf-fortran-4.6.1-fxgap5e/lib/libnetcdff.so.7 (0x00007f6581bd7000)
	libpioc.so => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/parallelio-2.6.2-7nsarej/lib/libpioc.so (0x00007f6581b8d000)
	libMAPL.shared.so => not found
	libMAPL.constants.so => not found
	libmom.so => not found
	libfms_r8.so => not found
	libmpi_usempif08.so.40 => /discover/swdev/jcsda/spack-stack/openmpi-4.1.3/gcc-10.1.0/lib/libmpi_usempif08.so.40 (0x00007f657dbdf000)
	libmpi_usempi_ignore_tkr.so.40 => /discover/swdev/jcsda/spack-stack/openmpi-4.1.3/gcc-10.1.0/lib/libmpi_usempi_ignore_tkr.so.40 (0x00007f657d9d0000)
	libmpi_mpifh.so.40 => /discover/swdev/jcsda/spack-stack/openmpi-4.1.3/gcc-10.1.0/lib/libmpi_mpifh.so.40 (0x00007f657d762000)
	libmpi.so.40 => /discover/swdev/jcsda/spack-stack/openmpi-4.1.3/gcc-10.1.0/lib/libmpi.so.40 (0x00007f657d42b000)
	libgcc_s.so.1 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/gcc-runtime-10.1.0-jyojh2v/lib/libgcc_s.so.1 (0x00007f657d213000)
	libquadmath.so.0 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/gcc-runtime-10.1.0-jyojh2v/lib/libquadmath.so.0 (0x00007f657cfcc000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f657cc27000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f6581a8e000)
	libhdf5_hl.so.310 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/hdf5-1.14.3-f6oe7ak/lib/libhdf5_hl.so.310 (0x00007f6581b62000)
	libhdf5.so.310 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/hdf5-1.14.3-f6oe7ak/lib/libhdf5.so.310 (0x00007f657c793000)
	libbz2.so.1.0 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/bzip2-1.0.8-6hjios7/lib/libbz2.so.1.0 (0x00007f6581b4f000)
	libzstd.so.1 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/zstd-1.5.2-3gdefmm/lib/libzstd.so.1 (0x00007f657c6b0000)
	libblosc.so.1 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/c-blosc-1.21.5-zljgv4z/lib64/libblosc.so.1 (0x00007f6581b3c000)
	libxml2.so.2 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/libxml2-2.10.3-gduleje/lib/libxml2.so.2 (0x00007f657c54a000)
	libz.so.1 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/zlib-1.2.13-bkfuyvq/lib/libz.so.1 (0x00007f6581b22000)
	liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007f657c324000)
	libcurl.so.4 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/curl-8.4.0-nra4tbo/lib/libcurl.so.4 (0x00007f657c26e000)
	libpnetcdf.so.4 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/parallel-netcdf-1.12.3-7gc5uun/lib/libpnetcdf.so.4 (0x00007f657bcc6000)
	libopen-rte.so.40 => /discover/swdev/jcsda/spack-stack/openmpi-4.1.3/gcc-10.1.0/lib/libopen-rte.so.40 (0x00007f657ba0c000)
	libopen-pal.so.40 => /discover/swdev/jcsda/spack-stack/openmpi-4.1.3/gcc-10.1.0/lib/libopen-pal.so.40 (0x00007f657b6ff000)
	libudev.so.1 => /usr/lib64/libudev.so.1 (0x00007f6581afc000)
	libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0 (0x00007f657b4f5000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007f657b2f2000)
	liblz4.so.1 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/lz4-1.9.4-c7vj67d/lib/liblz4.so.1 (0x00007f657b280000)
	libiconv.so.2 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/libiconv-1.17-ek4whxn/lib/libiconv.so.2 (0x00007f657b174000)
	libnghttp2.so.14 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/nghttp2-1.57.0-jbz7kdc/lib/libnghttp2.so.14 (0x00007f6581aca000)
	libssl.so.3 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/openssl-3.1.3-btuhp54/lib64/libssl.so.3 (0x00007f657b0c8000)
	libcrypto.so.3 => /gpfsm/dnb55/projects/p01/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-10.1.0/install/gcc/10.1.0/openssl-3.1.3-btuhp54/lib64/libcrypto.so.3 (0x00007f657aba0000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f657a97a000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f6581ac2000)
	libpcre.so.1 => /discover/swdev/jcsda/spack-stack/miniconda-3.9.7/lib/libpcre.so.1 (0x00007f657a933000)

@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 6, 2024

Actually, I wanted the ldd of the other one (the AWS).

As for this, again, you might need to update the GCC version to say 10.3 if you want to stay with GNU 10.

@climbfuji
Copy link
Collaborator Author

@mathomp4 Which of those many MPI options do you use with gcc@12.1.0, and which module do you use for the compiler? I have:

dheinzel@discover12:/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-12.1.0> module load comp/gcc/12.1.0
dheinzel@discover12:/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/spack-stack-geos/envs/unified-env-gcc-12.1.0> module av

----------------------------------------------------------------------------------------------------------------------------------- /usr/local/share/modulefiles/Compiler/comp/gcc/12.1.0 ------------------------------------------------------------------------------------------------------------------------------------
   lib/mkl/17.0.7.259    lib/mkl/19.0.1.144    lib/mkl/19.1.0.166    lib/mkl/20.0.0.166        lib/mkl/2021.4.0    mpi/hpcx/2.4.0-debug        mpi/impi/19.0.0.117    mpi/impi/19.0.5.281    mpi/impi/19.1.3.304    mpi/impi/2021.2.0        mpi/impi/2021.6.0
   lib/mkl/18.0.3.222    lib/mkl/19.0.2.187    lib/mkl/19.1.1.217    lib/mkl/2021.1.1          lib/mkl/2022.0.1    mpi/hpcx/2.4.0       (D)    mpi/impi/19.0.1.144    mpi/impi/19.1.0.166    mpi/impi/20.0.0.154    mpi/impi/2021.3.0        mpi/impi/2021.7.0
   lib/mkl/18.0.5.274    lib/mkl/19.0.4.243    lib/mkl/19.1.2.254    lib/mkl/2021.2.0   (D)    lib/mkl/2022.1.0    mpi/impi/17.0.7.259         mpi/impi/19.0.2.187    mpi/impi/19.1.1.217    mpi/impi/20.0.0.166    mpi/impi/2021.4.0 (D)    mpi/sgi-mpt/2.16
   lib/mkl/19.0.0.117    lib/mkl/19.0.5.281    lib/mkl/19.1.3.304    lib/mkl/2021.3.0          lib/mkl/2022.2.0    mpi/impi/18.0.5.274         mpi/impi/19.0.4.243    mpi/impi/19.1.2.254    mpi/impi/2021.1.1      mpi/impi/2021.5.0        mpi/sgi-mpt/2.17  (D)

Thanks!

@climbfuji
Copy link
Collaborator Author

Question was answered elsewhere reposting here:

On SLES12 we use: mpi/openmpi/4.1.3/gcc-12.1.0
For SLES15, we use: mpi/openmpi/4.1.6/gcc-12.3.0

The SI Team maintains these modules. On SLES12:

module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12

On SLES15:

ml use -a /discover/swdev/gmao_SIteam/modulefiles-SLES15

@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 7, 2024

Interesting. I never got a notifcation for either of these. That is...annoying. Time to delve into GitHub settings...

@tclune
Copy link

tclune commented Feb 8, 2024

I missed this too because its a ticket in a repository that I generally ignore. Reading now.

@tclune
Copy link

tclune commented Feb 8, 2024

Ok - looks like a trivial fix - I'm missing yet another RECURSIVE declaration somewhere in the stack. The real question is why we don't see this with GEOS. It's only used by either UFS or GEOS in the ExtData2G package.

Note for the curious: RECURSIVE is default as of Fortran 2018, so technically the software is correct. Not that it helps the end users, but these things help me to sleep a bit better at night. :-)

@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 8, 2024

@climbfuji Can you point us to the build for that so we can see the post-processed code. (also, probably need permissions)

@climbfuji
Copy link
Collaborator Author

Thanks for helping @tclune and @mathomp4 . The build is in

/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/GEOSgcm/build-gnu/

and the script I source to set the environment/load the modules before building is

/discover/nobackup/projects/jcsda/s2127/dheinzel/GEOS_20240119/setup-gnu.sh

I think the permissions allow you to access these.

@tclune
Copy link

tclune commented Feb 8, 2024

Memory starting to come back to me. The procedure that is lacking the RECURSIVE attribute appears to be the intrinsic assignment on the type Omap_i_s_node in module fy_mapping. I.e. the copy_ part of the name is how the compiler apparently names the intrinsic assignment operation.

And, of course, one cannot add RECURSIVE to an intrinsic procedure. (One can argue the language should effectively do that for you, but gfortran 10 was either before F2018, or not long after, so ...)

If memory serves, the last time I got here, I tried to create a user-defined overload for ASSIGNMENT(=) for this derived type, but then ran into other compiler issues. I don't know if it was an ICE, or mere rejection of letting me put RECURSIVE there or if there was some other runtime issue.

If important, I could try again, but would rather say "gfortran-10" is broken in terms of required feature sets for GFE/MAPL/GEOS.

@climbfuji
Copy link
Collaborator Author

Thanks @tclune . I think that's a fair point. We should add a conflict with gcc@10 into our geos-gcm-env virtual package and move on.

@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 8, 2024

Confirmed that I can run GEOSgcm main of today using GCC 10.3.0:

/discover/nobackup/mathomp4/Experiments/gcc10.3-2024Feb08-1day-c24-SLES12

Modules were:

comp/gcc/10.3.0
mpi/openmpi/4.0.6/gcc-10.3.0
lib/mkl/2021.4.0
python/GEOSpyD/Min4.11.0_py3.9_AND_Min4.8.3_py2.7

though I suppose only the first two matter.

@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 8, 2024

Note: only on SLES12. NCCS hasn't install GCC 10 on SLES15 probably because there has been no call for it.

@climbfuji climbfuji added the INFRA JEDI Infrastructure label Feb 8, 2024
@climbfuji
Copy link
Collaborator Author

Thanks @mathomp4. I'll close this issue and we will make a move to a newer gcc (probably go straight to 12) on SLES12=SCU15/16=cascadelake .

@climbfuji
Copy link
Collaborator Author

This is already closed, but for the record updating to gcc@12.1.0 on SLES12 allowed me to build and run GEOS-GCM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working INFRA JEDI Infrastructure
Projects
None yet
Development

No branches or pull requests

3 participants