Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to implement asynchronous (i.e. non-blocking io) in httr? #271

Closed
timwilliate opened this issue Aug 12, 2015 · 14 comments
Closed

Comments

@timwilliate
Copy link

This is not an issue, more of a question that I wanted to pose to the community. If there is already a method for achieving this, I apologize for the repeat and would appreciate being pointed in the right direction.

I use httr extensively and often find myself in situations where the needed aggregation of data requires hundreds of thousands of REST calls. This becomes performance limiting in R because it seems like every http call made using httr is a blocking call.

Is there any plan or path forward for enabling asynchronous io within httr similar to what exists in Python via aiohttp or Scala via Akka-http?

Thank you

@jeroen
Copy link
Member

jeroen commented Aug 13, 2015

Do you just want concurrent downloads or a javascript ajax style framework with xhr objects that implement callbacks for success, failure, etc? I am not sure the R language is very suitable for the latter for due to the lack of event loop and threading difficulties. But maybe I'm wrong.

What would such an interface look like if you could design it? Can you elaborate on your use case? Are these thousands of requests all parallel or is there a structure where some calls need data from a previous call?

@zachmayer
Copy link

Here's an example from Rcurl:

getURIs =
function(uris, ..., multiHandle = getCurlMultiHandle(), .perform = TRUE)
{
  content = list()
  curls = list()

  for(i in uris) {
    curl = getCurlHandle()
    content[[i]] = basicTextGatherer()
    opts = curlOptions(URL = i, writefunction = content[[i]]$update, ...)    
    curlSetOpt(.opts = opts, curl = curl)
    multiHandle = push(multiHandle, curl)
  }

  if(.perform) {
     complete(multiHandle)
     lapply(content, function(x) x$value())
   } else {
     return(list(multiHandle = multiHandle, content = content))
   }
}

There is also getURIAsynchronous

Here's an example use case where this would be helpful. I want to submit 1,000 requests to the server, and each request takes 10 minutes to process (the server has to lookup some data and do some math that takes a long time). However, the sever can handle many thousands of simultaneous requests.

Currently, I'm looping through something like this:

requests <- lapply(urls, POST, ...)

Which blocks on each request and takes 1,000 x 10 minutes to complete. It'd be really nice to be able to send each request off to the server, without blocking on the request being completed. Then after they have all been submitted, we can block on collecting the results with a loop like this:

requests <- lapply(urls, POST, ..., async=TRUE)
results  <- lapply(results, httr::complete)

Where httr::complete would be similar to RCurl::complete. The second example is nice because we can submit all the requests at once and let the server start processing them before blocking on gathering the results. In theory, this loop would take ~10 minutes to complete, plus the overhead of the 2 lapply loops.

@hamelsmu
Copy link

I have the exact same use case as Zach above and would love to see a solution to this.

@hadley
Copy link
Member

hadley commented Dec 17, 2015

I think this should probably be an issue in https://github.com/jeroenooms/curl/issues - once curl has an API for async requests, httr can wrap around it.

@zachmayer
Copy link

@hadley curl now experimentally supports async: jeroen/curl#51

@rentrop
Copy link

rentrop commented Jul 16, 2016

Any thoughts on this issue? It would be very great to see this in httr and rvest

@abeburnett
Copy link

Wandered across this because I need this exact functionality. Need to be able to issue async requests via httr. Is this possible now in R via httr (I just love httr, so I hope so!)?

@fabiangehring
Copy link

fabiangehring commented Apr 28, 2017

Just came accross this post because I'd be interested in this as well. Curl seems to support asynchronous requests now. Any chance httr could wrap around this as @hadley commented some time ago? Edit: Just noticed that the issue is closed. Wouldn't it be worth reopening it?

@timwilliate
Copy link
Author

I agree with @fabiangehring now that the dependent functionality in curl is available, could this issue be reopened @hadley?

@abeburnett
Copy link

Any word on this yet? My scraper is now taking 13 hours to run synchronously--making it asynchronous would speed it up a lot!

@jeroen
Copy link
Member

jeroen commented Jul 10, 2017

Why not just use curl directly?

@timwilliate
Copy link
Author

@jeroen My first thought is the semantics offered by httr are quite nice and friendly. If you have written a lot of code using httr the ideal scenario is to have an httr friendly way to handle async requests.

@jeroen
Copy link
Member

jeroen commented Jul 10, 2017

I'm not sure how the httr semantics would generalize to async requests though.

@king-of-poppk
Copy link

I'm not sure how the httr semantics would generalize to async requests though.

Return a promise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants