-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Licsbas02_ml_prep infinite run #91
Comments
Please try Also could you tell me the details of your environment below?
|
Thanks, I tried with 1 processor but the results is the same, My config is the following: Ubuntu 12.04 LTS I use the same config with the march 2020 version and it was working. The only change I have to do is to update gdal to reach the requirements. |
How about nlook=10 and n_para=1? The multiprocessing module seems to be stacked sometimes on a cluster. In many case n_para=1 can avoid the problem. I have no idea to fix the problem because I have no cluster and cannot reproduce the situation... |
Hello Yu, I get the same results with nlook 10 and n_para=1. I tried the script with other tracks and it stopped at the same level. I spoke with the IT about the cluster but they said that it a normal disk on a server. It might be the Ubuntu version 12.04 that cause issue. Thanks a lot for your help. |
What is the frame ID? |
Have you tried again after removing GEOCml*? |
Initially I tried with 066D_04410_131313 and later I use one that I have already process 157D_07909_131307. I create a new folder and yes each time I re run the batch I delete all the folder except the interferograms ( rm -r GACOS/ GEOCml1* log). I also changed GACOS = no for both GACOS option. I am installing Licsbas on other unit with a more recent Ubuntu. I will let you know what are the outcome. |
I have tested the frames and successfully finished the processing. Perhaps one of the data downloaded in your GEOC dir might be corrupted. I suggest removing the GEOC dir and trying again with nlook=10. |
Hello Yu,We are installing a new server to test it. I will let you know when we have the first results.Cheers,Envoyé depuis mon appareil Galaxy
-------- Message d'origine --------De : Yu Morishita ***@***.***> Date : 16.03.21 17:54 (GMT-06:00) À : yumorishita/LiCSBAS ***@***.***> Cc : gpsovsicori ***@***.***>, Author ***@***.***> Objet : Re: [yumorishita/LiCSBAS] Licsbas02_ml_prep infinite run (#91)
I have tested the frames and successfully finished the processing. Perhaps one of the data in the GEOC dir might be corrupted. I suggest removing the GEOC dir and trying again with nlook=10.
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hey Yu,
We install on a new server (Ubuntu 20) and the parallelization works fine!
However, we have an issue on a the step 13 (frame:
157D_07909_131307_rincon_2021)
Running 1740000/1740469th point...
Traceback (most recent call last):
File
"/respaldo/InSAR/LiCSAR/LiCSBAS-master_2021/LiCSBAS/bin/LiCSBAS13_sb_inv.py",
line 996, in <module>
sys.exit(main())
File
"/respaldo/InSAR/LiCSAR/LiCSBAS-master_2021/LiCSBAS/bin/LiCSBAS13_sb_inv.py",
line 731, in main
gap[:, rows[0]:rows[1], :] = gap_patch.reshape((n_im-1, lengththis,
width))
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File
"/root/.pyenv/versions/miniconda3-latest/lib/python3.8/site-packages/h5py/_hl/dataset.py",
line 708, in __setitem__
self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw
File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite
OSError: Can't write data (file write failed: time = Tue Mar 23 11:56:02
2021
, filename =
'/respaldo/InSAR/LiCSAR/157D_07909_131307_rincon_2021/TS_GEOCml1/cum.h5',
file descriptor = 5, errno = 5, error message = 'Input/output error', buf =
0x55aead837b80, total write size = 80, bytes this sub-write = 80, bytes
actually written = 18446744073709551615, offset = 2236)
Traceback (most recent call last):
File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
OSError: Driver write request failed (file write failed: time = Tue Mar 23
11:56:02 2021
, filename =
'/respaldo/InSAR/LiCSAR/157D_07909_131307_rincon_2021/TS_GEOCml1/cum.h5',
file descriptor = 5, errno = 5, error message = 'Input/output error', buf =
0x55aea35266f0, total write size = 725, bytes this sub-write = 725, bytes
actually written = 18446744073709551615, offset = 2316)
Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
OSError: Driver write request failed (file write failed: time = Tue Mar 23
11:56:02 2021
, filename =
'/respaldo/InSAR/LiCSAR/157D_07909_131307_rincon_2021/TS_GEOCml1/cum.h5',
file descriptor = 5, errno = 5, error message = 'Input/output error', buf =
0x55aea35266f0, total write size = 725, bytes this sub-write = 725, bytes
actually written = 18446744073709551615, offset = 2316)
Any clue?
Le jeu. 18 mars 2021 à 21:32, cyril.muller21 ***@***.***> a
écrit :
… Hello Yu,
We are installing a new server to test it. I will let you know when we
have the first results.
Cheers,
Envoyé depuis mon appareil Galaxy
-------- Message d'origine --------
De : Yu Morishita ***@***.***>
Date : 16.03.21 17:54 (GMT-06:00)
À : yumorishita/LiCSBAS ***@***.***>
Cc : gpsovsicori ***@***.***>, Author <
***@***.***>
Objet : Re: [yumorishita/LiCSBAS] Licsbas02_ml_prep infinite run (#91)
I have tested the frames and successfully finished the processing. Perhaps
one of the data in the GEOC dir might be corrupted. I suggest removing the
GEOC dir and trying again with nlook=10.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP3HKSHZHG7EMOCH4WP33SLTD7VR7ANCNFSM4ZAXNYGQ>
.
|
I have never seen this error. Could you upload the full log file? |
Hello Yu,
Here is the log file.
Le mar. 23 mars 2021 à 17:25, Yu Morishita ***@***.***> a
écrit :
… I have never seen this error. Could you upload the full log file?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP3HKSDSZTU2RLM2IF7JG73TFEPQHANCNFSM4ZAXNYGQ>
.
|
Where is the log file? |
It was attached in my email but apparently Github do not accept email with attached files. Now from github it may works better |
I have tried processing the same data but could not reproduce the error. The error is related to writing to cum.h5. Please try again after removing the existing cum.h5. |
Hello Yu, a) I delete the cum.h5 and rerun but I get the same error. I attached the new log as well as the batch file. In case you can detect something suspicious b) I also tried with another area and different Frame (084D_08014_091312) and the same error, I add the log file and the batch c) These two processing were done from a virtual Ubuntu 20.04 virtual machine and the disk was on a server. I do another attempt downloading LiCSBAS from github in local on the virtual machine. Then, I start from zero for a new area in local. I use the provided batch file and the only edit that have done is p05_clip_range_geo="-84.80/-84.50/9.96/10.25". I could avoid the previsous error but I get stuck in the " Identifing gaps, and counting n_gap and n_ifg_noloop, with 4 parallel processing..." At this stage, the script seems to run infinitely (CPU usage 0%, I wait 4 hours, while the other step last several minute). This error is similar to the first one with Ubuntu 12.04., on later step, though. Here the log: I also try with this frame: 092A_07941_091203 and 084D 08014091312, the process also stops but later, at the step 16th. Here: Size of image (w,l) : 2738, 2853 HP filter in time, LP filter in space, Trying with different parameter for instance: Change the step when the process stops. Well this was my trying the last days, for me it doesn't have any sense but I hope that for you, Yu, it does. If not, don't worry I will try to find the solution. Once again thank you for your patience. |
Could you show me the output of LiCSBAS_check_install.py which includes the version of the modules? |
(miniconda3-latest) root@ovsicori-virtual-machine:~/LiCSBAS_processing/092A_07941_091203# LiCSBAS_check_install.py Python version: 3.8.2 Check required modues and versions Check LiCSBAS commands Check LiCSBAS library LiCSBAS install is OK [1]+ Done LiCSBAS_plot_ts.py -i TS_GEOCml1/cum_filt.h5 |
The module versions are completely the same as mine. I have no idea about the error in step13. For the stop in step16, please try n_para=1 (see #86). |
Yes it works! That's great Yu! |
I am glad to here that. I would appreciate if you could solve the error in step13 and post how to solve it. |
Hello Yu, for step 13th, I just worked in local instead of in a server. I am not sure if it an issue with writting rights or something else. I am currently off but I will check it next week when I am back. |
Hello Yu, I am back from holidays. I have mor time now. |
This thread would be fine. |
hello, dear users LiCSBAS02_ml_prep.py ver1.7.4 20201119 Y. Morishita Create E.geo Create N.geo Create U.geo Create slc.mli Create hgt Create unw and cc Create slc.mli.par |
Hello Yu,
I installed the new version, I had previously run LICSBAS with the version of March 2020. The script that checks the installation mentioned that the installation is OK. I start a new project in the Swiss Alps wit images between 2016-2018, everything run fine with the previous version but with the new one, at the stage: Licbas02_mlprep It seems to run indefinitely. %the CPU usage is 0 for 3 hours.
PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
6 937 0.0 0.0 28148 3204 pts/8 S Mar10 0:00 | _ bash
7121 0.0 0.0 16576 1528 pts/8 S+ 08:40 0:00 | _ /bin/bash -eu ./batch_LiCSBAS.sh
17000 0.3 1.2 2216628 401384 pts/8 Sl+ 08:41 0:40 | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27232 0.0 1.2 2048596 412872 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27263 0.0 1.2 2048596 410824 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27280 0.0 1.2 2048596 412940 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27316 0.0 1.2 2048596 407716 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27325 0.0 1.2 2048596 414996 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27338 0.0 1.2 2048596 414960 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27352 0.0 1.2 2048596 415004 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27365 0.0 1.2 2048596 414940 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
17001 0.0 0.0 11380 716 pts/8 S+ 08:41 0:00 | _ tee -a log/202103110840batch_LiCSBAS_01_16.log
I tried de change the number of used CPU but the results is the same.
It is stopped here:
LiCSBAS02_ml_prep.py ver1.7.4 20201119 Y. Morishita
LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
Create E.geo
E.geo created
Create N.geo
N.geo created
Create U.geo
U.geo created
Create slc.mli
slc.mli[.png] created
Create hgt
hgt[.png] created
Create unw and cc
8 parallel processing...
0/141th IFG...
20/141th IFG...
No 20170409_20170509.geo.unw.tif found. Skip
No 20170415_20170427.geo.unw.tif found. Skip
No 20170415_20170509.geo.unw.tif found. Skip
30/141th IFG...
No 20170509_20170521.geo.unw.tif found. Skip
No 20170509_20170602.geo.unw.tif found. Skip
10/141th IFG...
No 20170427_20170515.geo.unw.tif found. Skip
Do you have an idea why is it blocked or this phase is it that long?
The text was updated successfully, but these errors were encountered: