I'm using the rolling curl library to fire http requests for content-length headers of images (we need to know their size to weed out placeholders and low res images). The image urls are stored in a database so I need to loop over the data in our products table (approx 1 million rows but will grow bigger, potentially much bigger)
I'm using PHP and the laravel framework (the artisan cli component). The operation seems to slow down as time progresses e.g. it starts processing 100 requests in less than a second and later the time to process 100 rows/requests is logged at over 20 seconds. Can anyone explain this and / or offer any performance improvement suggestions? The task is running on an Amazon EC2 micro instance so processing power / memory is limited.
public function fire()
{
$dt = new DateTime();
Log::info("started: ".$dt->format('Y-m-d H:i:s'));
$counter = 1;
Item2::where('img_size', '=', NULL)->chunk(1000, function($items) use ( &$counter)
{
$results = array();
$filePath = storage_path().'/imports/new/new_img_sizes_'.$counter.'.csv';
if (!File::exists($filePath)) {
File::put($filePath, '');
}
$start = microtime(true);
$rollingCurl = new \RollingCurl\RollingCurl();
$rollingCurl->setOptions($this->curlOptions);
foreach ($items as $item)
{
if ($item->img !== '') {
$results[$item->id] = array('url' => $item->img, 'size' => null);
$rollingCurl->get($item->img);
}
}
//callback runs on each curl request
$rollingCurl->setCallback(function(\RollingCurl\Request $request, \RollingCurl\RollingCurl $rollingCurl) use (&$results, $filePath) {
$responseInfo = $request->getResponseInfo();
//var_dump($responseInfo);exit;
$length = $responseInfo['download_content_length'];
foreach ($results as $key => $value) {
if (array_search($request->getURL(),$value)) {
$idKey = $key;
$results[$idKey]['size'] = $length;
File::append($filePath,$idKey.','.$results[$idKey]['size']."\r\n");
break;
}
}
})
->setSimultaneousLimit(10)
->execute();
$counter++;
echo 'done in...'.(microtime(true) - $start).PHP_EOL;
Log::info('1000 records: '.(microtime(true) - $start));
Log::info('Last url was: '.json_encode(end($results)));
exit;
}); // end item chunk
Some benchmarks:
done in...3.8803641796112
done in...7.4326379299164
done in...8.1860301494598
done in...8.5088090896606
done in...10.606615781784
done in...10.655412912369
done in...10.804574966431
done in...14.004528045654
done in...10.903785943985
done in...11.905344009399
done in...13.763195991516
done in...14.723680019379
done in...15.823812961578
done in...17.972007989883
done in...31.734715938568
done in...20.509822845459
done in...22.924754858017
done in...34.274693012238
done in...39.217702865601
done in...29.883662939072
done in...24.094554901123
done in...25.726534128189
done in...31.788655996323
done in...24.713880062103
done in...25.855134963989
done in...23.161122083664
done in...32.380167007446
done in...36.53077507019
done in...31.859884023666
done in...71.458341121674