Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I'am trying to build a PHP/MySQL/jQuery comment system. Off late I started to realize that manual spamming is a serious issue that need to be addressed carefully. So I thought of building a PHP function (initially, later can be implemented in OOPS concept) that will sniff the content and give points (spam points) based on some criteria. Getting a spam point less than will make the content inactive and send for moderation, whereas posts with spam points greater than 2 won't make the content eligible to be inserted in DB. This is a very basic piece of codes, so I want you to give me your valuable suggestion as how to make this code much better and if I am doing something wrong here.

<?php 

#spam points < 2 { will be send for moderation}
#spam points > 2 { wont be posted , ie wont be inserted in DB}

function sniffSpams($content)
{
    $spam_points    =    0;
    $url_pattern    =   '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';


    preg_match_all($url_pattern, $content, $matches, PREG_PATTERN_ORDER);


    if(! empty($matches))
    {
        //url is/are present

        $get_number_of_urls    =   count($matches[0]);       //get the number of urls/emails present in content

        if($get_number_of_urls > 2)
            $spam_points += $get_number_of_urls;            //1 point per link
        else
        {
            $spam_words_uri  = array('free', 'vote', 'play');

            //if less thamn 2 , check for length of url
            foreach ($matches[0] as $url) 
            {
                if(strlen($url) > 150) //long url mostly are spam
                   $spam_points += 1;

                foreach($spam_words_uri as $spam)
                {
                    if(stripos($url, $spam) !== false ) 
                        $spam_points += 1;
                }

            }

        }

    }

   $spam_words  = array('Levitra', 'viagra', 'casino', '*');


   foreach($spam_words as $spam)
   {
        if(stripos($content, $spam) !== false ) 
            $spam_points += 1;

   }

    return $spam_points;

}



echo sniffSpams('the * dsjdjsd ');
share|improve this question
1  
You could pull the spam words from a database or text file. That way you don't have to keep change the code. –  James Feb 13 at 22:28
 
@James yep , that is the next thing , will be pulling strings (banned etc) form text file or so . Next will be wrapping this thing into a class . But what do you thing about the function ? Feasible or need some more changes ? –  CodingAnt Feb 14 at 6:50
add comment

1 Answer

Minor (and subjective) remarks:

  • Function name: The function does not sniff spam but calculate a score for the given content. Therefore I'd rename the function to something like getSpamScore
  • The variable $matches is not declared. Declaring it before using in in preg_match_all allows IDEs to perform code analysis and other developers don't have a hard time looking for its declaration.
  • Two empty lines just take up space. Your code should be clear enough from structure not to need any extra indentions (which you don't have) or double blank lines. As the preg_match_all is directly related to the if and only to it, I'd not have any lines at all between the match and the if-statement. (This applies to all other lines too)
  • //url is/are present comment: more of a personal thing maybe: I don't like obvious comments. Personally I prefer the if it needs a comment it needs to be refactored way of writing comments (non in about almost every case). Comments need maintenance time (updating comments for changed code). But time should only be spent here if it provides value. Additionally comments tend to loose their relation to the code they originally were attached to (mostly by refactoring). This greatly confuses other developers.
  • $get_number_of_urls naming: this variable contains a verb. Verbs indicate actions (e.g. a method). In PHP I'd expect this variable to be a closur, not an integer. Yet you don't actually get the number or urls, you already have them. A proper name might be number_of_urls instead.
  • Stick to one convention of styling code: in your case this especially applies for where to have whitespaces. In your if and foreach lines you mix up foreach( foreach ( and your if conditions sometimes have whitespaces in between and sometimes not. I suppose this is some kind of work-in-progress code. Yet later styling costs time. Code you write always should meet production ready requirements (regarding style). It takes some time at the beginning to stick with someones conventions, but you'll get used to it and do it automatically after some time.
  • In my opinion this function has too many responsibilities (at least two). I'd suggest splitting this function into two sub-functions. This function itself only calls and sums of the results of the independent tests. When going OO this of course wants a design pattern ;)
share|improve this answer
 
Thanks a ton for the remarks . Will be working on those you pointed out ! –  CodingAnt Feb 17 at 4:49
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.