Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Licsbas02_ml_prep infinite run #91

Closed
gpsovsicori opened this issue Mar 11, 2021 · 25 comments
Closed

Licsbas02_ml_prep infinite run #91

gpsovsicori opened this issue Mar 11, 2021 · 25 comments
Labels
bug Something isn't working

Comments

@gpsovsicori
Copy link

Hello Yu,

I installed the new version, I had previously run LICSBAS with the version of March 2020. The script that checks the installation mentioned that the installation is OK. I start a new project in the Swiss Alps wit images between 2016-2018, everything run fine with the previous version but with the new one, at the stage: Licbas02_mlprep It seems to run indefinitely. %the CPU usage is 0 for 3 hours.
PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
6 937 0.0 0.0 28148 3204 pts/8 S Mar10 0:00 | _ bash
7121 0.0 0.0 16576 1528 pts/8 S+ 08:40 0:00 | _ /bin/bash -eu ./batch_LiCSBAS.sh
17000 0.3 1.2 2216628 401384 pts/8 Sl+ 08:41 0:40 | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27232 0.0 1.2 2048596 412872 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27263 0.0 1.2 2048596 410824 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27280 0.0 1.2 2048596 412940 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27316 0.0 1.2 2048596 407716 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27325 0.0 1.2 2048596 414996 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27338 0.0 1.2 2048596 414960 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27352 0.0 1.2 2048596 415004 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
27365 0.0 1.2 2048596 414940 pts/8 S+ 08:41 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8
17001 0.0 0.0 11380 716 pts/8 S+ 08:41 0:00 | _ tee -a log/202103110840batch_LiCSBAS_01_16.log

I tried de change the number of used CPU but the results is the same.
It is stopped here:

LiCSBAS02_ml_prep.py ver1.7.4 20201119 Y. Morishita
LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 8

Create E.geo
E.geo created

Create N.geo
N.geo created

Create U.geo
U.geo created

Create slc.mli
slc.mli[.png] created

Create hgt
hgt[.png] created

Create unw and cc
8 parallel processing...
0/141th IFG...
20/141th IFG...
No 20170409_20170509.geo.unw.tif found. Skip
No 20170415_20170427.geo.unw.tif found. Skip
No 20170415_20170509.geo.unw.tif found. Skip
30/141th IFG...
No 20170509_20170521.geo.unw.tif found. Skip
No 20170509_20170602.geo.unw.tif found. Skip
10/141th IFG...
No 20170427_20170515.geo.unw.tif found. Skip

Do you have an idea why is it blocked or this phase is it that long?

@yumorishita
Copy link
Owner

Please try --n_para 1

Also could you tell me the details of your environment below?

  • OS
  • type and number of CPU
  • RAM size
  • Working disk (e.g., HDD, SSD, external with USB3.0, NAS, etc.)

@gpsovsicori
Copy link
Author

gpsovsicori commented Mar 12, 2021

Thanks, I tried with 1 processor but the results is the same,
ovsicori 4879 1.1 1.3 2216628 434364 pts/8 Sl+ 18:58 0:37 | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 1
ovsicori 20199 0.0 1.3 2048460 439184 pts/8 S+ 18:58 0:00 | | _ python3 /respaldo/InSAR/LiCSAR/LiCSBAS/LiCSBAS/bin/LiCSBAS02_ml_prep.py -i GEOC -n 1 --n_para 1
ovsicori 4880 0.0 0.0 11380 712 pts/8 S+ 18:58 0:00 | _ tee -a log/202103111854batch_LiCSBAS_01_16.log
ovsicori 360 0.0 0.0 25840 1260 pts/9 Ss+ Mar10 0:00 _ -csh

My config is the following:

Ubuntu 12.04 LTS
OS type 64bit
processor Intel® Xeon(R) Gold 6140 CPU @ 2.30GHz × 16
RAM 31.4 Gib
Disk +3 TB within a cluster of disk.

I use the same config with the march 2020 version and it was working. The only change I have to do is to update gdal to reach the requirements.

@yumorishita
Copy link
Owner

How about nlook=10 and n_para=1?

The multiprocessing module seems to be stacked sometimes on a cluster. In many case n_para=1 can avoid the problem. I have no idea to fix the problem because I have no cluster and cannot reproduce the situation...

@yumorishita yumorishita added the bug Something isn't working label Mar 14, 2021
@gpsovsicori
Copy link
Author

gpsovsicori commented Mar 15, 2021

Hello Yu,

I get the same results with nlook 10 and n_para=1. I tried the script with other tracks and it stopped at the same level. I spoke with the IT about the cluster but they said that it a normal disk on a server. It might be the Ubuntu version 12.04 that cause issue. Thanks a lot for your help.

@yumorishita
Copy link
Owner

What is the frame ID?

@yumorishita
Copy link
Owner

Have you tried again after removing GEOCml*?

@gpsovsicori
Copy link
Author

Initially I tried with 066D_04410_131313 and later I use one that I have already process 157D_07909_131307. I create a new folder and yes each time I re run the batch I delete all the folder except the interferograms ( rm -r GACOS/ GEOCml1* log). I also changed GACOS = no for both GACOS option.
Here is the log file up to where it stops:
log_157D_licsbas.txt

I am installing Licsbas on other unit with a more recent Ubuntu. I will let you know what are the outcome.

@yumorishita
Copy link
Owner

yumorishita commented Mar 16, 2021

I have tested the frames and successfully finished the processing. Perhaps one of the data downloaded in your GEOC dir might be corrupted. I suggest removing the GEOC dir and trying again with nlook=10.

@gpsovsicori
Copy link
Author

gpsovsicori commented Mar 19, 2021 via email

@gpsovsicori
Copy link
Author

gpsovsicori commented Mar 23, 2021 via email

@yumorishita
Copy link
Owner

I have never seen this error. Could you upload the full log file?

@gpsovsicori
Copy link
Author

gpsovsicori commented Mar 24, 2021 via email

@yumorishita
Copy link
Owner

Where is the log file?

@gpsovsicori
Copy link
Author

It was attached in my email but apparently Github do not accept email with attached files. Now from github it may works better
202103231434batch_LiCSBAS_01_16.log

@yumorishita
Copy link
Owner

I have tried processing the same data but could not reproduce the error. The error is related to writing to cum.h5. Please try again after removing the existing cum.h5.

@gpsovsicori
Copy link
Author

Hello Yu,
Thanks you for you patience. Here is the different tests, I have done so far:

a) I delete the cum.h5 and rerun but I get the same error. I attached the new log as well as the batch file. In case you can detect something suspicious
batch_LiCSBAS.txt
.
202103260844batch_LiCSBAS_01_16.log

b) I also tried with another area and different Frame (084D_08014_091312) and the same error, I add the log file and the batch
202103260859batch_LiCSBAS_01_16.log

c) These two processing were done from a virtual Ubuntu 20.04 virtual machine and the disk was on a server. I do another attempt downloading LiCSBAS from github in local on the virtual machine.

Then, I start from zero for a new area in local. I use the provided batch file and the only edit that have done is p05_clip_range_geo="-84.80/-84.50/9.96/10.25". I could avoid the previsous error but I get stuck in the " Identifing gaps, and counting n_gap and n_ifg_noloop, with 4 parallel processing..."

At this stage, the script seems to run infinitely (CPU usage 0%, I wait 4 hours, while the other step last several minute). This error is similar to the first one with Ubuntu 12.04., on later step, though.

Here the log:
202103260921batch_LiCSBAS_01_16.log

I also try with this frame: 092A_07941_091203 and 084D 08014091312, the process also stops but later, at the step 16th.

Here:
"
LiCSBAS16_filt_ts.py ver1.5.1 20210311 Y. Morishita
LiCSBAS16_filt_ts.py -t TS_GEOCml1 -s 1

Size of image (w,l) : 2738, 2853
Number of images : 95
Width of filter in space : 1.0 km (9.1x9.0 pixel)
Width of filter in time : 0.163 yr (59 days)
Deramp flag : []
hgt-linear flag : False

HP filter in time, LP filter in space,
with 4 parallel processing...
0/ 95th image...
20/ 95th image...
10/ 95th image...
30/ 95th image...
60/ 95th image...
90/ 95th image...
80/ 95th image...
"

Trying with different parameter for instance:
p11_unw_thre="0.25" # default: 0.3
p11_coh_thre="0.04" # default: 0.05
p12_loop_thre="2" # default: 1.5 rad
p12_multi_prime="y" # y/n. y recommended
p12_rm_ifg_list="" # List file containing ifgs to be manually removed
p15_coh_thre="" # default: 0.05
p15_n_unw_r_thre="1.3" # default: 1.5

Change the step when the process stops.

Well this was my trying the last days, for me it doesn't have any sense but I hope that for you, Yu, it does.

If not, don't worry I will try to find the solution.

Once again thank you for your patience.

@yumorishita
Copy link
Owner

Could you show me the output of LiCSBAS_check_install.py which includes the version of the modules?

@gpsovsicori
Copy link
Author

(miniconda3-latest) root@ovsicori-virtual-machine:~/LiCSBAS_processing/092A_07941_091203# LiCSBAS_check_install.py

Python version: 3.8.2
OK

Check required modues and versions
astropy(4.2) OK
bs4(4.9.3) OK
h5py(2.10.0) OK
matplotlib(3.3.4) OK
numpy(1.19.2) OK
psutil(5.8.0) OK
requests(2.25.1) OK
statsmodels(0.12.2) OK
gdal(3.0.2) OK

Check LiCSBAS commands
OK

Check LiCSBAS library
OK

LiCSBAS install is OK

[1]+ Done LiCSBAS_plot_ts.py -i TS_GEOCml1/cum_filt.h5
(miniconda3-latest) root@ovsicori-virtual-machine:~/LiCSBAS_processing/092A_07941_091203#

@yumorishita
Copy link
Owner

The module versions are completely the same as mine. I have no idea about the error in step13.

For the stop in step16, please try n_para=1 (see #86).

@gpsovsicori
Copy link
Author

Yes it works! That's great Yu!
Thanks you very very much!

@yumorishita
Copy link
Owner

I am glad to here that. I would appreciate if you could solve the error in step13 and post how to solve it.

@gpsovsicori
Copy link
Author

Hello Yu, for step 13th, I just worked in local instead of in a server. I am not sure if it an issue with writting rights or something else. I am currently off but I will check it next week when I am back.

@gpsovsicori
Copy link
Author

Hello Yu, I am back from holidays. I have mor time now.
How/where should I help you to post the solution to error in step 13?

@yumorishita
Copy link
Owner

This thread would be fine.

@zahraforoodi
Copy link

hello, dear users
Also, I encountered the error.
LiCSBAS02_ml_prep.py -i GEOC -n 10

LiCSBAS02_ml_prep.py ver1.7.4 20201119 Y. Morishita
LiCSBAS02_ml_prep.py -i GEOC -n 10

Create E.geo
E.geo created

Create N.geo
N.geo created

Create U.geo
U.geo created

Create slc.mli
No *.geo.mli.tif found in GEOC

Create hgt
hgt[.png] created

Create unw and cc

Create slc.mli.par
Traceback (most recent call last):
File "/home/zahra/anaconda3/LiCSBAS/bin/LiCSBAS02_ml_prep.py", line 453, in
sys.exit(main())
^^^^^^
File "/home/zahra/anaconda3/LiCSBAS/bin/LiCSBAS02_ml_prep.py", line 325, in main
print('range_samples: {}'.format(width), file=f)
^^^^^
UnboundLocalError: cannot access local variable 'width' where it is not associated with a value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants