For anyone who’s ever tried to fetch multiple resources over HTTP in PHP, the logic is trivial, but one key challenge is ever-present: latency delays. While web servers have perfectly good downstream links, latencies can increase script execution time tenfold just by downloading a few external URLs. But there’s a simple solution: parallel cURL operations. In this tutorial, I’ll show you how to use the “multi” functions in PHP’s cURL library to get around this quickly and easily.
Caching alleviates the latency issue to some extent, but retrieving more than a few files is always going to be a problem, and, well, sometimes users just can’t wait. cURL’s parallel processing allows you to fire off multiple requests at a time and handle responses as they arrive, instead of linear operations – waiting for each request to complete (or worse, time out) before starting the next.
Consider this basic cURL example:
This will fetch the initial resource of http://example.com/ and put the data (the HTML) in
$data. If we wanted to do this multiple times, we could use a simple
for loop around this code block and repeat. However, through this method script execution time increases are linear, proportionate to the latencies of each network request, and latencies of 50-100ms x 10 requests don’t help when you barely spend <10ms executing all your PHP code.
Instead, we’ll use cURL’s parallel processing system. This requires a bit of a context shift – instead of running each operation, you now have to tell cURL all the operations to run, let it do it’s stuff, and then continue on once it has finished. The difference is that it doesn’t wait for each request – it runs them all simultaneously (network permitting). Here’s a basic example:
0); $data1 = curl_multi_getcontent($ch1); $data2 = curl_multi_getcontent($ch2); curl_multi_remove_handle($ch1); curl_multi_remove_handle($ch2); curl_multi_close($mh); ?>
Here, we first create a cURL connection object for each request we want to make – an array of these is perfectly acceptable – and set the options on each. Instead of
curl_init(), we then call
curl_multi_init() and point the library at each of our connection objects. We have to cede control from cURL at this point:
curl_multi_exec() now runs all the sub-connections of the current cURL object – that is,
curl_multi_exec() function takes a second parameter, a reference to a flag of whether operations are still running. When that parameter – for us,
$running – is 0, cURL’s finished taking care of requests and we can proceed. Timeouts are the only concern – remember to set the timeout option to a reasonable value to avoid being held up by requests that won’t complete.
As you increase in requests, however, remember that the next bottleneck you will hit is memory. Given PHP’s memory_limit flag (especially on shared servers), you can actually hit this quicker than you think. cURL can’t read part of a file or stream, even if it’s packet-based. Ever seen
fread($handle, 8192)? The second parameter results in 8KB chunks, avoiding memory limits elegantly. cURL, however, will simply collect up all your request responses: if you hit your memory limit, you either get a fatal error or a “white screen of death”. Consider parallel cURL-ing in a background process. Also, these simple routines can get quite lengthy – consider building an abstraction layer for your app/framework, to maintain an array of cURL connection objects and interface requests.
Parallel HTTP requests in cURL are quick and easy, despite the architectural differences with libraries for single-threaded cURL. To learn more, just check out the detailed documentation on the PHP manual page.