I've been writing a small site which is essentially just a service API for other applications but deals with the caching and serving of files. In an effort to make the site as scalable and maintainable as possible I did some research into the most efficient ways to serve files to the client, using PHP for access control.
I figured I'd probably need this sort of functionality again somewhere down the line so I decided to write a small-ish static utility class to serve up files. I'd like feedback on the overall robustness/efficiency of the code as well as comments on how well this follows the HTTP specifications. Improvements are always welcome as I haven't had a chance to really stress test the code yet.
A few of the optimizations are really targeted towards an Apache web-server, but the code itself should work on any PHP box. If anyone knows how to check for the mod_xsendfile
module on Nginx/lighttpd accurately that would be a great addition.
<?php
class FileServer {
const TRANSFER_CHUNK_SIZE = 8192;
/**
* void serveFile(string $filepath, string $realname, string $mimeType, [bool $publicFile=true, [bool $allowPartial=true, [callable $callback=false ]]])
* Serve the file residing at $filepath to the client.
* @param $filepath The absolute or relative URI to the file
* @param $realname The name to give the client for this file (such as in the Save dialog from a browser)
* @param $mimeType The desired MIME type to send with the response (allows for custom mime-types rather than just using PECLs FileInfo)
* @param $publicFile Whether the file is safe for public access or not (i.e you want to hide the true location of the file from clients). Optional, defaults to true
* @param $allowPartial Whether or not to accept partial requests (via the HTTP_RANGE header) to download the file.
* @param $callback An optional callback to invoke before terminating the script, allows for any cleanup code to run.
* Function singature should match 'int function(int)' where the parameter indicates the status code being sent to the client.
* The function should return the desired exit code
*
* @remarks After calling this function the script will be guaranteed to terminate on all branches after invoking $callback
*/
public static function serveFile($filepath, $realname, $mimeType, $publicFile=true, $allowPartial=true, $callback=false) {
if (!is_file($filepath)) {
header('HTTP/1.0 404 Not Found', true, 404);
exit(self::invokeCallback($callback, 404));
}
$size = filesize($filepath);
$headers = array();
// get all available headers
foreach($_SERVER as $k=>$v) {
$key = strtolower($k);
if (strpos($key, 'http_') === 0) { $key = substr($key, 5); }
$headers[$key] = $v;
}
// pick up any apache http headers that weren't in $_SERVER (need reference as to whether this is even possible?)
if (function_exists('apache_request_headers')) {
$headers += array_change_key_case(apache_request_headers());
}
// check if a range was specified
$range = (isset($headers['range']) ? $headers['range'] : false);
if ($range !== false && $allowPartial) { // need to handle a partial request
if (($ranges = self::parseRange($range, $size)) === false) { // badly formatted range from client
header('HTTP/1.1 416 Requested Range Not Satisfiable', true, 416);
header('Content-Range: bytes */' . $size, true);
exit(self::invokeCallback($callback, 416));
}
}
// Allow for some caching optimization, although in my experience this won't hit too often from browsers.
$ims = !empty($headers['if_modified_since']) ? $headers['if_modified_since'] : false;
$inm = !empty($headers['if_none_match']) ? $headers['if_none_match'] : false;
if (self::cacheControl($filepath, $ims, $inm)) {
header("HTTP/1.1 304 Not Modified", true, 304);
exit(self::invokeCallback($callback, 304));
}
if (function_exists('apache_get_modules')) { // try to optimize with apache (x-sendfile header)
if (in_array('mod_xsendfile', apache_get_modules())) {
// note: X-Sendfile claims to handle HTTP_RANGE headers properly,
// so that is why this is the leading code branch
header('X-Sendfile: ' . $filepath);
header("Content-Type: {$mimeType}");
header("Content-Disposition: attachment; filename=\"{$realname}\"");
exit(self::invokeCallback($callback, 200));
}
}
// Common headers
header("Content-Type: {$mimeType}", true);
header("Accept-Ranges: " . ($acceptPartial ? 'bytes' : 'none'), true);
// send a partial request
if ($range !== false && $allowPartial) {
$contentRanges = self::implodeAssoc($ranges, '-', ',');
// send appropriate partial header info
header("HTTP/1.1 206 Partial Content", true, 206);
header("Content-Range: bytes {$contentRanges}/{$size}", true);
if (($fp = fopen($filepath, 'r')) === false) {
header("HTTP/1.0 500 Internal Server Error", true, 500);
exit(self::invokeCallback($callback, 500));
}
foreach($ranges as $start=>$end) {
$length = ($end - $start) + 1;
// Open up a file stream to serve the chunked request
if ($start > 0 && fseek($fp, $start, SEEK_SET) === -1) {
header("HTTP/1.0 500 Internal Server Error", true, 500);
exit(self::invokeCallback($callback, 500));
}
// Transfer the data, one TRANSFER_CHUNK_SIZE block at a time to
// reduce the memory footprint on larger files
$chunks = (int)($length / self::TRANSFER_CHUNK_SIZE);
$delta = $length % self::TRANSFER_CHUNK_SIZE;
for($i = 0; $i < $chunks; ++$i) {
echo fread($fp, self::TRANSFER_CHUNK_SIZE);
}
// handle the residual data that didn't align along TRANSFER_CHUNK_SIZE
echo fread($fp, $delta);
}
fclose($fp);
exit(self::invokeCallback($callback, 206));
}
// By now it's a pretty grim situation for file i/o.
// Possible redirect opportunity using the Location header. Note: this only works
// on public documents, $publicFile should be false when serving restricted content.
// Also, the file must reside in the document root to be accessible by setting Location
if ($publicFile && ($url = self::filepathToUrl($filepath, true)) !== false) {
header("Location: {$url}", true);
exit(self::invokeCallback($callback, 302));
}
// Give up, going to have to use PHP :)
header("Content-Length: {$size}", true);
header("Content-Disposition: attachment; filename=\"{$realname}\"", true);
readfile($filepath);
exit(self::invokeCallback($callback, 200));
}
// Pretty self-explanatory, just a helper to abstract the way ETags are generated
public static function generateETag($filepath, $salt='') {
return hash('sha256', $filepath . $salt);
}
// Another lazy helper to validate a callback and invoke it, or return a default value
private static function invokeCallback($callback, $response, $default=0) {
if (is_callable($callback)) {
return call_user_func($callback, $response);
}
return $default;
}
private static function cacheControl($filepath, $ifModifiedSince, $ifNoneMatch) {
// Do the caching housekeeping.
// Function returns true if the cached version is up-to-date (i.e a 304 is acceptable), or false otherwise
$mtime = filemtime($filepath);
$time = gmdate('D, d M Y H:i:s \G\M\T', $mtime);
$etag = self::generateETag($filepath, $mtime);
if ($ifModifiedSince !== false || $ifNoneMatch !== false) {
if ($ifModifiedSince == $time || $etag == str_replace('"', '', stripslashes($ifNoneMatch))) {
return true;
}
}
// send some validation headers for cache-control later
header('Last-Modified: ' . $time, true);
header('Cache-Control: must-revalidate', true);
header('ETag: ' . $etag, true);
return false;
}
private static function implodeAssoc($assoc, $keyValueSeparator, $entrySeparator) {
// A really, really lazy way to implode an array by first delimiting the keys and values by one delimiter
// then delimiting this new array by another delimiter
return implode($entrySeparator,
array_map(function($k,$v) use($keyValueSeparator) {
return "{$k}{$keyValueSeparator}{$v}";
}, array_keys($assoc), $assoc)
);
}
private static function filepathToUrl($filepath, $relative=false) {
// normalize to *Nix path separators
$root = str_replace('\\', '/', $_SERVER['DOCUMENT_ROOT']);
$filepath = str_replace('\\', '/', realpath($filepath));
$relpath = str_replace($root, '', $filepath);
if ($filepath === $relpath && $filepath[0] !== '/') // can't convert the absolute path to a URL because it's outside the document root
return false;
if ($relpath[0] != '/') $relpath = '/' . $relpath;
if (!$relative) {
$protocol = "http" . ((empty($_SERVER['HTTPS']) || $_SERVER['HTTPS'] === 'off') ? '' : 's') . '://';
return $protocol . $_SERVER['SERVER_NAME'] . $relpath;
}
return $relpath;
}
/**
* Parse a HTTP_RANGE header value into an array of 'start' => 'end' ranges
* @param $range the raw HTTP_RANGE value to parse
* @param $size the size of the file that is being partially requested
*
* @return an associative array of 'start' => 'end' integer pairs if the range is valid, false otherwise
*/
private static function parseRange($range, $filesize) {
if (!preg_match('/^bytes=\\d*-\\d*(,\d*-\d*)*$/', $range)) {
return false;
}
$ranges = explode(',', substr($range, 6));
$specs = array();
for($i = 0; $i < count($ranges); ++$i) {
$parts = explode('-', $ranges[$i]);
if (empty($parts[0]) && empty($parts[1])) {
return false; //have to specify at least one side of the range
}
// Try to comply with the standard as best I understand it here
$end = !empty($parts[1]) ? intval($parts[1]) : $filesize - 1;
$start = !empty($parts[0]) ? intval($parts[0]) : ($filesize - ($end + 1));
if ($end > ($filesize - 1)) $end = $filesize - 1;
if ($start > $end) {
return false;
}
$specs[$start] = $end;
}
return $specs;
}
}
?>
Appreciate the advice.