Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detailed_itineraries- function seems to get stuck when requesting (repeatingly) several thousand connections #199

Closed
SRN1973 opened this issue Sep 10, 2021 · 4 comments

Comments

@SRN1973
Copy link

SRN1973 commented Sep 10, 2021

When feeding several thousands/million points to the detailed_itineraris-function (in chunks of e.g. 5000)
r5r seems to reach a state when it seems to get stuck. That means at first to the end of the processing of the
input of a chunk of from-to points to calculate the progress gradually gets slower and slower (it seemingly waits for longer running processes to finish which can in some instances take a few minutes),
nevertheless after some time it finishes and jumps to the next chunk of data to process. However after some time the whole calculation seems to get stuck. Thereby it seems as if r5 is finished with the calculations but r5r seems to wait for something returned it never gets (or it considerably slows down so that it seems that nothing is happen anymore for hours).

The same behaviour occurs when feeding e.g. 1000000 requests to the function at once. (Here approximately 990000 connections are calculated in just a few minutes...then a few more take quite some time (hours) and then no progress (even no background processes calculationg something) can be seen...whereas the whole script is still active but seemingly waiting for something it does not get).

(prior to using r5r version 0.6.0 I did not encounter this error, however the input point set I use now has changed)

(I suspect that the problem described in issue 198 could be related to the odd behaviour described here)

Part of code used

#1) custom function triggering the detailed_itineraries request for chunks of data

r5r_thuenen_detailed <- function(vonPlace,zuPlace,mode,departure_datetime,max_walk_dist,max_rides,r5r_core) {
result <- NULL

st <- Sys.time()
result <- detailed_itineraries(r5r_core = r5r_core,
origins = vonPlace,
destinations = zuPlace,
mode = mode,
mode_egress = mode_egress, #nach verlassen des ÖPNV (WALK, BICYCLE, CAR -> default WALK
departure_datetime = departure_datetime,
max_walk_dist = max_walk_dist,
max_rides = max_rides,
shortest_path = TRUE,
verbose = FALSE,
drop_geometry = TRUE,
walk_speed = walk_speed_kmh,
max_trip_duration = max_trip_duration,
n_threads = 60
)#eo detailed_itineraries
print(paste0("Beende Berechnung. Rechenzeit gesamt: ",Sys.time()-st))

if((!is.data.frame(result) | !is.data.table(result)) && nrow(result) !=0) {
message("===> Ergebnis 1 fehlerhaft: <===")
message(head(result))
}#eo if

result
}#eo r5r_thuenen_detailed

#2) Part of code that calls the above r5r_thuenen detailed - function

print("Generiere/Lade das Verkehrswegenetz...")

r5r_core<- startR5Rcore() #calls a function initializing 5r5_core

print(paste0("Starte Berechnung Erreichbarkeitsanalyse: ",Sys.time()))
start <- Sys.time()

accessibilityAnalysisResults <- NULL

chunkSize <- maxChunkSize #e.g. 1000 or 5000 or 100000
chunkStart <- 1
chunkEnd <- chunkSize
iterationCount <- 1
rJava::.jgc(R.gc = TRUE)
print("")
print(paste0("Chunk Size: ",chunkSize))
print("
")

while(chunkStart <= nrow(vonPlace)) {

  print("##########################################################################################################################")
  print(paste0(chunkStart,"---->",chunkEnd," of ",nrow(vonPlace)," Berechung Chunk Start: ", Sys.time()," Skript Start: ",start))
  print("##########################################################################################################################")

      vonPlaceChunk <- NULL
      zuPlaceChunk <- NULL

      vonPlaceChunk <-as.data.frame(vonPlace[chunkStart:chunkEnd,])
      zuPlaceChunk <- as.data.frame(zuPlace[chunkStart:chunkEnd,])

  tmpResults <- NULL

  #übergebe den Chunk nur wenn er auch Daten beinhaltet...falls nicht liefere NULL zurück
  if(nrow(vonPlaceChunk) >0) {
 
         memAvailable <- as.numeric(system("awk '/MemAvailable/ {print $2}' /proc/meminfo", intern=TRUE))
         memTotal <- as.numeric(system("awk '/MemTotal/ {print $2}' /proc/meminfo", intern=TRUE))

         print("...............................................................")
         print(paste0("GB RAM gesamt: ",round(memTotal/1000000,1)," - GB RAM genutzt: ",round((memTotal-memAvailable)/1000000,1)," - GB RAM frei: ",round(memAvailable/1000000,2)))
         print("................................................................")
      
         if(memAvailable/memTotal <= 0.1) {
            print("Weniger als 0.1 % RAM frei...versuche RAM freizugeben")
            rJava::.jgc(R.gc = TRUE)
         }#eo if

         if(chunkEnd <= nrow(vonPlace)){
           print(paste0(chunkStart,"---->",chunkEnd," of ",nrow(vonPlace)))
           tmpResults <-  r5r_thuenen_detailed(vonPlaceChunk ,zuPlaceChunk ,mode,departure_datetime,max_walk_dist,max_rides,r5r_core)
           
         } else {
           chunkEnd <- nrow(vonPlace)
           print(paste0("last chunk: ",chunkStart,"---->",chunkEnd))
           tmpResults <-  r5r_thuenen_detailed(vonPlaceChunk ,zuPlaceChunk ,mode,departure_datetime,max_walk_dist,max_rides,r5r_core)
         
         }#eo if else

  }#eo if(nrow(vonPlaceChunk) >0) 

 #print(head(tmpResults))

 chunkEnd <- chunkEnd+chunkSize
 chunkStart  <- chunkEnd - chunkSize + 1

 #accessibilityAnalysisResults <- rbind(accessibilityAnalysisResults, tmpResults) #rbind is problematic with big data
 accessibilityAnalysisResults[[iterationCount]] <- tmpResults
 iterationCount <- iterationCount+1

}#eo while

print(paste0("Beende Berechnung. Rechenzeit gesamt: ",Sys.time()-start))


Operating System

Ubuntu 20.04.1 LTS
RAM: 1 TB
120 cernels


sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] future_1.21.0 R.utils_2.10.1 R.oo_1.24.0 R.methodsS3_1.8.1 doParallel_1.0.16 iterators_1.0.13 foreach_1.5.1
[8] r5r_0.6.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 pillar_1.4.7 compiler_3.6.3 class_7.3-17 tools_3.6.3 digest_0.6.27 checkmate_2.0.0
[8] lifecycle_0.2.0 tibble_3.0.4 pkgconfig_2.0.3 rlang_0.4.10 jdx_0.1.4 DBI_1.1.1 rstudioapi_0.13
[15] curl_4.3.2 rJava_1.0-4 e1071_1.7-8 httr_1.4.2 dplyr_1.0.3 globals_0.14.0 generics_0.1.0
[22] vctrs_0.3.6 classInt_0.4-3 grid_3.6.3 tidyselect_1.1.0 glue_1.4.2 data.table_1.14.0 listenv_0.8.0
[29] sf_1.0-2 R6_2.5.1 parallelly_1.23.0 purrr_0.3.4 magrittr_2.0.1 backports_1.2.1 codetools_0.2-18
[36] ellipsis_0.3.1 units_0.7-2 KernSmooth_2.23-18 proxy_0.4-26 crayon_1.3.4

@SRN1973
Copy link
Author

SRN1973 commented Sep 15, 2021

After playing around with the function parameters I realized that the "max_trip_duration" and especially "max_rides" parameters are responsible for the behaviour I described above. Every increase in "max_rides" means the RAPTOR algorithm needs to do an extra round of search, including more public transport routes that need to be considered. So, the search space really can grow exponentially in bigger networks. And this seems R5 causing to runs in a state where it stops working properly...

For a network covering Germany in total including the total German public transport, setting "max_trip_duration" to 120 and "max_rides" to 3 (on a server with 120 cores and 1TB RAM) technically worked for me (although it still is not the perfect solution as restricting max_rides to three does also restrict the validity of my analysis results...).

@SRN1973 SRN1973 closed this as completed Sep 15, 2021
@rafapereirabr
Copy link
Member

Thank you for the feedback, @SRN1973. Have you considered using the travel_time_matrix() function? It is much faster than detailed_itineraries()

@SRN1973
Copy link
Author

SRN1973 commented Sep 16, 2021 via email

@rafapereirabr
Copy link
Member

Hi Stefan @SRN1973 . I see, but you don't really need the geometries, though, do you?

ps. We might be able to address the second part of the problem in the near future. Keep an eye on issue #194.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants