detailed_itineraries- function seems to get stuck when requesting (repeatingly) several thousand connections #199

SRN1973 · 2021-09-10T10:28:33Z

When feeding several thousands/million points to the detailed_itineraris-function (in chunks of e.g. 5000)
r5r seems to reach a state when it seems to get stuck. That means at first to the end of the processing of the
input of a chunk of from-to points to calculate the progress gradually gets slower and slower (it seemingly waits for longer running processes to finish which can in some instances take a few minutes),
nevertheless after some time it finishes and jumps to the next chunk of data to process. However after some time the whole calculation seems to get stuck. Thereby it seems as if r5 is finished with the calculations but r5r seems to wait for something returned it never gets (or it considerably slows down so that it seems that nothing is happen anymore for hours).

The same behaviour occurs when feeding e.g. 1000000 requests to the function at once. (Here approximately 990000 connections are calculated in just a few minutes...then a few more take quite some time (hours) and then no progress (even no background processes calculationg something) can be seen...whereas the whole script is still active but seemingly waiting for something it does not get).

(prior to using r5r version 0.6.0 I did not encounter this error, however the input point set I use now has changed)

(I suspect that the problem described in issue 198 could be related to the odd behaviour described here)

Part of code used

#1) custom function triggering the detailed_itineraries request for chunks of data

r5r_thuenen_detailed <- function(vonPlace,zuPlace,mode,departure_datetime,max_walk_dist,max_rides,r5r_core) {
result <- NULL

st <- Sys.time()
result <- detailed_itineraries(r5r_core = r5r_core,
origins = vonPlace,
destinations = zuPlace,
mode = mode,
mode_egress = mode_egress, #nach verlassen des ÖPNV (WALK, BICYCLE, CAR -> default WALK
departure_datetime = departure_datetime,
max_walk_dist = max_walk_dist,
max_rides = max_rides,
shortest_path = TRUE,
verbose = FALSE,
drop_geometry = TRUE,
walk_speed = walk_speed_kmh,
max_trip_duration = max_trip_duration,
n_threads = 60
)#eo detailed_itineraries
print(paste0("Beende Berechnung. Rechenzeit gesamt: ",Sys.time()-st))

if((!is.data.frame(result) | !is.data.table(result)) && nrow(result) !=0) {
message("===> Ergebnis 1 fehlerhaft: <===")
message(head(result))
}#eo if

result
}#eo r5r_thuenen_detailed

#2) Part of code that calls the above r5r_thuenen detailed - function

print("Generiere/Lade das Verkehrswegenetz...")

r5r_core<- startR5Rcore() #calls a function initializing 5r5_core

print(paste0("Starte Berechnung Erreichbarkeitsanalyse: ",Sys.time()))
start <- Sys.time()

accessibilityAnalysisResults <- NULL

chunkSize <- maxChunkSize #e.g. 1000 or 5000 or 100000
chunkStart <- 1
chunkEnd <- chunkSize
iterationCount <- 1
rJava::.jgc(R.gc = TRUE)
print("")
print(paste0("Chunk Size: ",chunkSize))
print("")

while(chunkStart <= nrow(vonPlace)) {

  print("##########################################################################################################################")
  print(paste0(chunkStart,"---->",chunkEnd," of ",nrow(vonPlace)," Berechung Chunk Start: ", Sys.time()," Skript Start: ",start))
  print("##########################################################################################################################")

      vonPlaceChunk <- NULL
      zuPlaceChunk <- NULL

      vonPlaceChunk <-as.data.frame(vonPlace[chunkStart:chunkEnd,])
      zuPlaceChunk <- as.data.frame(zuPlace[chunkStart:chunkEnd,])

  tmpResults <- NULL

  #übergebe den Chunk nur wenn er auch Daten beinhaltet...falls nicht liefere NULL zurück
  if(nrow(vonPlaceChunk) >0) {
 
         memAvailable <- as.numeric(system("awk '/MemAvailable/ {print $2}' /proc/meminfo", intern=TRUE))
         memTotal <- as.numeric(system("awk '/MemTotal/ {print $2}' /proc/meminfo", intern=TRUE))

         print("...............................................................")
         print(paste0("GB RAM gesamt: ",round(memTotal/1000000,1)," - GB RAM genutzt: ",round((memTotal-memAvailable)/1000000,1)," - GB RAM frei: ",round(memAvailable/1000000,2)))
         print("................................................................")
      
         if(memAvailable/memTotal <= 0.1) {
            print("Weniger als 0.1 % RAM frei...versuche RAM freizugeben")
            rJava::.jgc(R.gc = TRUE)
         }#eo if

         if(chunkEnd <= nrow(vonPlace)){
           print(paste0(chunkStart,"---->",chunkEnd," of ",nrow(vonPlace)))
           tmpResults <-  r5r_thuenen_detailed(vonPlaceChunk ,zuPlaceChunk ,mode,departure_datetime,max_walk_dist,max_rides,r5r_core)
           
         } else {
           chunkEnd <- nrow(vonPlace)
           print(paste0("last chunk: ",chunkStart,"---->",chunkEnd))
           tmpResults <-  r5r_thuenen_detailed(vonPlaceChunk ,zuPlaceChunk ,mode,departure_datetime,max_walk_dist,max_rides,r5r_core)
         
         }#eo if else

  }#eo if(nrow(vonPlaceChunk) >0) 

 #print(head(tmpResults))

 chunkEnd <- chunkEnd+chunkSize
 chunkStart  <- chunkEnd - chunkSize + 1

 #accessibilityAnalysisResults <- rbind(accessibilityAnalysisResults, tmpResults) #rbind is problematic with big data
 accessibilityAnalysisResults[[iterationCount]] <- tmpResults
 iterationCount <- iterationCount+1

}#eo while

print(paste0("Beende Berechnung. Rechenzeit gesamt: ",Sys.time()-start))

Operating System

Ubuntu 20.04.1 LTS
RAM: 1 TB
120 cernels

sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] future_1.21.0 R.utils_2.10.1 R.oo_1.24.0 R.methodsS3_1.8.1 doParallel_1.0.16 iterators_1.0.13 foreach_1.5.1
[8] r5r_0.6.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 pillar_1.4.7 compiler_3.6.3 class_7.3-17 tools_3.6.3 digest_0.6.27 checkmate_2.0.0
[8] lifecycle_0.2.0 tibble_3.0.4 pkgconfig_2.0.3 rlang_0.4.10 jdx_0.1.4 DBI_1.1.1 rstudioapi_0.13
[15] curl_4.3.2 rJava_1.0-4 e1071_1.7-8 httr_1.4.2 dplyr_1.0.3 globals_0.14.0 generics_0.1.0
[22] vctrs_0.3.6 classInt_0.4-3 grid_3.6.3 tidyselect_1.1.0 glue_1.4.2 data.table_1.14.0 listenv_0.8.0
[29] sf_1.0-2 R6_2.5.1 parallelly_1.23.0 purrr_0.3.4 magrittr_2.0.1 backports_1.2.1 codetools_0.2-18
[36] ellipsis_0.3.1 units_0.7-2 KernSmooth_2.23-18 proxy_0.4-26 crayon_1.3.4

The text was updated successfully, but these errors were encountered:

SRN1973 · 2021-09-15T13:25:18Z

After playing around with the function parameters I realized that the "max_trip_duration" and especially "max_rides" parameters are responsible for the behaviour I described above. Every increase in "max_rides" means the RAPTOR algorithm needs to do an extra round of search, including more public transport routes that need to be considered. So, the search space really can grow exponentially in bigger networks. And this seems R5 causing to runs in a state where it stops working properly...

For a network covering Germany in total including the total German public transport, setting "max_trip_duration" to 120 and "max_rides" to 3 (on a server with 120 cores and 1TB RAM) technically worked for me (although it still is not the perfect solution as restricting max_rides to three does also restrict the validity of my analysis results...).

rafapereirabr · 2021-09-16T00:18:56Z

Thank you for the feedback, @SRN1973. Have you considered using the travel_time_matrix() function? It is much faster than detailed_itineraries()

SRN1973 · 2021-09-16T07:14:04Z

Dear Rafael, thank you for your hint considering the travel_time_matrix()... I already thought about it, but unfortunately it does not exactly what i want / need. I have distinct routes I want to calculate (not every source to every destination), as such I would have to feed the routs one by one (altogether around 28 Millions...), so that I can not make use of the fast parallelization of the function...and in addition I am interested in the different travel times of the trip legs so that I can delete the initial waiting time from my results (assuming a best informed traveller starting the trip so that he/she has not to wait, etc.) or determine the public transport time only etc. regards, Stefan Von: "Rafael H M Pereira" ***@***.***> An: "ipeaGIT/r5r" ***@***.***> CC: "SRN1973" ***@***.***>, "Mention" ***@***.***> Gesendet: Donnerstag, 16. September 2021 02:19:06 Betreff: Re: [ipeaGIT/r5r] detailed_itineraries- function seems to get stuck when requesting (repeatingly) several thousand connections (#199) Thank you for the feedback, [ https://github.com/SRN1973 | @SRN1973 ] . Have you considered using the travel_time_matrix() function? It is much faster than detailed_itineraries() — You are receiving this because you were mentioned. Reply to this email directly, [ #199 (comment) | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AQEL3BJZKJPIFXMMIBK6XCTUCEZXVANCNFSM5DZBJ32Q | unsubscribe ] . Triage notifications on the go with GitHub Mobile for [ https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 | iOS ] or [ https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub | Android ] .

…

-- Dr. Stefan Neumeier Thünen-Institut für Ländliche Räume Bundesallee 64 38116 Braunschweig ***@***.*** Tel.: 0531-596-5241 !!! Auf Grund der Covid-19 Pandemie derzeit nur per Mail erreichbar !!! Homepage: http://www.thuenen.de ----------------------------------- Das Johann Heinrich von Thünen-Institut, Bundesforschungsinstitut für Ländliche Räume, Wald und Fischerei – kurz: Thünen-Institut – besteht aus 14 Fachinstituten, die in den Bereichen Ökonomie, Ökologie und Technologie forschen und die Politik beraten. The Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries – Thünen Institute in brief – consists of 14 specialized institutes that carry out research and provide policy advice in the fields of economy, ecology and technology.

rafapereirabr · 2021-09-16T11:49:14Z

Hi Stefan @SRN1973 . I see, but you don't really need the geometries, though, do you?

ps. We might be able to address the second part of the problem in the near future. Keep an eye on issue #194.

SRN1973 closed this as completed Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detailed_itineraries- function seems to get stuck when requesting (repeatingly) several thousand connections #199

detailed_itineraries- function seems to get stuck when requesting (repeatingly) several thousand connections #199

SRN1973 commented Sep 10, 2021

SRN1973 commented Sep 15, 2021

rafapereirabr commented Sep 16, 2021

SRN1973 commented Sep 16, 2021 via email

rafapereirabr commented Sep 16, 2021

detailed_itineraries- function seems to get stuck when requesting (repeatingly) several thousand connections #199

detailed_itineraries- function seems to get stuck when requesting (repeatingly) several thousand connections #199

Comments

SRN1973 commented Sep 10, 2021

(I suspect that the problem described in issue 198 could be related to the odd behaviour described here)

SRN1973 commented Sep 15, 2021

rafapereirabr commented Sep 16, 2021

SRN1973 commented Sep 16, 2021 via email

rafapereirabr commented Sep 16, 2021