You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per @paleolimbot 's suggestion, here is a reprex of a benchmark comparing spatial intersects with sf vs geos . st_intersects does seem faster than geos_intersects_matrix() at the moment.
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.0, PROJ 8.1.1; sf_use_s2() is TRUE
library(geos)
library(data.table)
library(microbenchmark)
options(timeout=10000)
rm(list= ls())
gc()
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)#> Ncells 901867 48.2 1397940 74.7 NA 1397940 74.7#> Vcells 1578643 12.1 8388608 64.0 16384 2650514 20.3url_trips="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2015-01.csv"
download.file(url=url_trips, destfile="trip_records.csv", mode="wb") # takes some time and is pretty heavy!trips.dt= fread("trip_records.csv", nrows=100000) # enough rows to benchmark performance# Keep only pickup (origin) coordinate columnscols_drop= grep("pickup_l", names(trips.dt), value=T, invert=T)
trips.dt[, (cols_drop) :=NULL]
trips.dt=trips.dt[!(pickup_longitude==0|pickup_latitude==0)] # obviously invalid coordinates# Generate geomtrips.dt[, `:=`(o_geom= geos_make_point(pickup_longitude,
pickup_latitude,
crs=4326))]
# Generate sftrips.sf= st_as_sf(trips.dt)
trips.sf= st_transform(trips.sf, 4326)
# Get polygons to intersecturl_zones="https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip"
download.file(url_zones, destfile="zones_polygons.zip", mode="wb")
unzip("zones_polygons.zip", exdir= file.path("zones"))
# sf versionzones.sf= st_read(file.path("zones", "taxi_zones.shp"))
#> Reading layer `taxi_zones' from data source #> `/private/var/folders/y3/cg53v8tx3g538m3751w1_9v80000gn/T/RtmpwdNjKW/reprex-4d044a0e522a-cilia-stag/zones/taxi_zones.shp' #> using driver `ESRI Shapefile'#> Simple feature collection with 263 features and 6 fields#> Geometry type: MULTIPOLYGON#> Dimension: XY#> Bounding box: xmin: 913175.1 ymin: 120121.9 xmax: 1067383 ymax: 272844.3#> Projected CRS: NAD83 / New York Long Island (ftUS)zones.sf= st_transform(zones.sf, 4326)
# geom versionzones.dt= as.data.table(zones.sf)[, geom:= as_geos_geometry(geometry, crs=4326)]
zones.dt[, geometry:=NULL]
# Functions to testgeos.dt=function(x) {
geos_intersects_matrix(zones.dt[, geom], trips.dt[, o_geom])
}
sf=function(x) {
st_intersects(zones.sf, trips.sf)
}
# Benchmark
microbenchmark(geos= geos.dt(),
sf= sf(),
times=10)
#> Unit: seconds#> expr min lq mean median uq max neval#> geos 4.217741 4.627104 5.214595 5.222753 5.842305 5.876518 10#> sf 3.014916 3.103549 3.493810 3.328212 3.455878 5.066740 10
I'm also not sure that {sf} and {geos} work exactly the same way. geos_intersects_matrix() expects STRTree as the second argument and it takes the most time to convert to this type, but @paleolimbot will explain it best.
system.time(trips.dt2<- as_geos_strtree(trips.dt[, o_geom]))
#> user system elapsed #> 4.88 0.02 4.89
system.time(geos_intersects_matrix(zones.dt[, geom], trips.dt2))
#> user system elapsed #> 0.14 0.00 0.14
I'm 99% certain that the difference is because of how accumulating potentially matching features (based on bbox) is implemented (in sf it's std::vector and here I over-allocate an intermediary buffer that's way too big and I should just use std::vector).
As per @paleolimbot 's suggestion, here is a reprex of a benchmark comparing spatial intersects with
sf
vsgeos
.st_intersects
does seem faster thangeos_intersects_matrix()
at the moment.Created on 2022-02-15 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: