I'm coding an algorithm to remove a parameter (let's call it foo
) from URL strings.
Of course, after the foo
parameter removal, the query string should remain valid (with a leading ?
and remaining parameters separated by &
).
I'd also like to remove the leading ?
if foo
was the only parameter.
Input examples:
http://example.com/?foo=42
http://example.com/?foo=42&bar=43
http://example.com/?bar=43&foo=42
http://example.com/?bar=43&foo=42&baz=44
Expected output:
http://example.com/
http://example.com/?bar=43
http://example.com/?bar=43
http://example.com/?bar=43&baz=44
My initial algorithm:
preg_replace_callback('/([?&])foo=[^&]+(&|$)/', function($matches) {
return $matches[2] ? $matches[1] : '';
}, $url);
The regex itself is rather simple. The callback logic is as follows:
- If
foo
is not the last parameter (2nd capturing group is not end of string), then the whole match is replaced by the first capturing group (?
or&
). This handles:?foo=valuefoo&bar
->?bar
&foo=valuefoo&bar
->&bar
- If
foo
is the last parameter then the whole match is replaced by an empty string. This handles:?bar=valuebar&foo=valuefoo
->?bar=valuebar
?foo=valuefoo
-> (empty string)
This logic seemed rather complicated, hence I rewrote it into a single regex:
preg_replace('/[?&]foo=[^&]+$|([?&])foo=[^&]+&/', '$1', $url);
Now both logic branches are separated by the regex OR |
and the 1st capturing group only occurs in the "foo is not the last parameter" branch.
Problem is, now the code is not very DRY anymore - I'm repeating foo=[^&]+
twice in the expression.
I've looked at regex conditionals but seems like it would result in even more unmaintainable code than my first approach. In fact, the linked page states that expressions using conditionals quickly become unwieldy and it is recommended to simply use capturing groups and validate them in source code which is easier to understand and maintain.
This seemed like a simple task at first glance, but now I'm wondering whether I should even use Regex for this.
Right now I'm thinking about substr
'ing from the first ?
, explode
ing the query string at &
, array_filter
based on the parameters names, implode
and concatenate it to the URL again, but this looks overly verbose.
We can ignore URL hashes/fragments for the purpose of this question.
Is there a better approach (mainly in terms of maintainability) to remove a query string parameter?
For testing: Codepad
Following @rolfl's advice to use native methods, as well as stealing some code from the PHP Docs' comments, here is my new approach:
//http://www.php.net/manual/en/function.parse-url.php#106731
function unparse_url($parsed_url) {
$scheme = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : '';
$host = isset($parsed_url['host']) ? $parsed_url['host'] : '';
$port = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : '';
$user = isset($parsed_url['user']) ? $parsed_url['user'] : '';
$pass = isset($parsed_url['pass']) ? ':' . $parsed_url['pass'] : '';
$pass = ($user || $pass) ? "$pass@" : '';
$path = isset($parsed_url['path']) ? $parsed_url['path'] : '';
$query = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : '';
$fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : '';
return "$scheme$user$pass$host$port$path$query$fragment";
}
function removeQueryParam($url, $param_to_remove) {
$parsed = parse_url($url);
if ($parsed && isset($parsed['query'])) {
$parsed['query'] = implode('&', array_filter(explode('&', $parsed['query']), function($param) use ($param_to_remove) {
return explode('=', $param)[0] !== $param_to_remove;
}));
if ($parsed['query'] === '') unset($parsed['query']);
return unparse_url($parsed);
} else {
return $url;
}
}
It works fine even with hashes/fragments now. Is there anything else to be improved? As far as I can see, there's no native method to parse a query string into an array, hence the explode
, array_filter
and implode
method is the more maintainable I could get.
Update: just to clarify a couple things, seeing @MrLore's answer made me realize my question lacked a couple details.
- Domain and pathname should be preserved.
- The URLs may not contain a query string. Expected output is the same as input.
- The URLs may contain a query string which does not contain the
foo
parameter. Expected output is the same as input. - The URLs are already properly URL-encoded.
- Fragment (hash) don't need to be considered -- my current use case's URLs do not have fragments, though I believe keeping it would be nice for future-proofness. Just a little extra, not necessary for the accepted answer.
Here's a couple more input and expected output to make these rules clearer:
http://domain.com.uk/pathname?foo=42&bar=bar%20value
http://yahoo.com/mail
http://nofoo.com/?bar=43
http://domain.com.uk/pathname?bar=bar%20value
http://yahoo.com/mail
http://nofoo.com/?bar=43
=]
– Fabrício Matté Dec 26 '13 at 10:49