Sanitising user inputs and user inputted data outputs

Question

I am creating a function for sanitising user inputs and user inputted data outputs.

Please offer advise on any improvements which could be made:

function cleanse($input) {

    $search = array(
        '@<script[^>]*?>.*?</script>@si',   // Strip out javascript
        '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
        '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
        '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments
    );

    $output = preg_replace($search, '', $input);
    return $output;
}

function sanitise($input) {
    if (is_array($input)) {
        foreach($input as $var=>$val) {
            $output[$var] = sanitize($val);
        }
    } else {
        $input  = cleanse($input);
        $output = htmlspecialchars($input, ENT_QUOTES,'UTF-8',false);
    }
    return $output;
}

Pinoniq · Answer 1 · 2014-05-14 15:56:55Z

up vote 1 down vote

Personally I tend to work with a white-list of characters.

So instead of sanitizing and removing bad stuf, I simply only accept good stuff.
e.g. [0-9]+ when validating an age field

This approach is less error prone because you don't have to think of all the bad things one can enter. And you know that the data you are serving is actualy what you tink it is (e.g. a number when referencing age).

answered May 14 '14 at 15:56

Pinoniq
2,453215

I'm working with sensitive information and nearly all inputted data is inputted and outputted as strings, all the inputs I have that are numerical use validation to ensure they are numeric for example. In the situation described, am I taking an approach which is not optimal? – danielsmile May 14 '14 at 16:25

@danielsmile It's not that it isn't optimal, it makes things easier for you to maintain in the long run if you only have to think about data that is allowed rather than not. – glitchmunki May 14 '14 at 16:37

add a comment |

user42486 · Answer 2 · 2014-05-14 17:01:48Z

Please see discussions on StackOverflow regarding parsing all HTML with regular expressions such as - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags -- The gist of it being: you can't do it. I would recommend using actual HTML parsing for this purpose, such as HTMLPurifier: http://htmlpurifier.org/

asked	1 year ago
viewed	70 times
active	1 year ago

current community

your communities

more stack exchange communities

Sanitising user inputs and user inputted data outputs

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged php or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Sanitising user inputs and user inputted data outputs

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged php or ask your own question.

Related

Hot Network Questions