Since I sanitize all the user-provided string before upload them to the DB, I wanted to give the users the possibility to format text as it happens here on Stack Exchange or on WhatsApp:

  • *word* -> bold
  • _word_ -> italic

This class contains two functions:

  • upload(): is called when the user upload a text and it replaces *word* with <b>word</b>, _word_ with <i>word</i> and \n with <br />
  • download(): is called when the user wants to modify the text and it does exactly the opposite; it replaces HTML tags with the custom signs * and _

My questions:

  1. Could be this code considered as a real class in the way of thinking or is it just procedural code put into a class?
  2. Would you improve it in any way?
  3. Do you have any suggestion to write it better?

class txtFormatting {
    private $text;

    function __construct($text)
    {
        $this->text = $text;
    }

    function upload() {
        $this->text = preg_replace('/[ \t]+/', ' ', $this->text); //transforms: 2/+ whitespaces -> 1 whitespace
        $this->text = nl2br($this->text); //transforms: \n -> <br />
        $this->text = preg_replace(array("/\r\n/", "/\n\r/", "/\n/", "/\r/"), '', $this->text); 
        $this->text = explode(' ', $this->text); //each word becomes a value
        $regexAY =
            [
                '/[*]{1}[a-zA-Z0-9]+[*]{1}/' =>
                    [
                        "pattern" => "*",
                        "openTag" => "<b>",
                        "closeTag" => "</b>"
                    ],
                '/[_]{1}[a-zA-Z0-9]+[_]{1}/' =>
                    [
                        "pattern" => "_",
                        "openTag" => "<i>",
                        "closeTag" => "</i>"
                    ]
            ];

        $newText = [];
        foreach ($this->text as $key => $word) {
            foreach ($regexAY as $regex => $value) {
                if (preg_match($regex, $word)) {
                    $pattern = $regexAY[$regex]["pattern"];
                    $openTag = $regexAY[$regex]["openTag"];
                    $closeTag = $regexAY[$regex]["closeTag"];
                    $word = preg_replace('/\\' .$pattern. '(.*?)\\' .$pattern. '/', $openTag. '$1' .$closeTag, $word); // /\*(.*?)\*/ OR /_(.*?)_/
                }
            }
            if ($word !== '') { array_push($newText, $word); }
        }

        return $this->text = implode(' ', $newText);
    }

    function download() {
        /*function br2nl() {
            return preg_replace('/\<br(\s*)?\/?\>/i', "\n", $this->text); // /\<br(\s*)?\/?\>/i
        }*/
        $this->text = preg_replace('/\<br(\s*)?\/?\>/i', "\n", $this->text);
        $this->text = explode(' ', $this->text);
        $regexAY =
            [
                '/<b>[a-zA-Z0-9]+<\/b>/' =>
                    [
                        "pattern" => ["/<b>/", "/<\/b>/"],
                        "replacement" => "*"
                    ],
                '/<i>[a-zA-Z0-9]+<\/i>/' =>
                    [
                        "pattern" => ["/<i>/", "/<\/i>/"],
                        "replacement" => "_"
                    ]
            ];

        $newText = [];
        foreach ($this->text as $key => $word) {
            foreach ($regexAY as $regex => $value) {
                if (preg_match($regex, $word)) {
                    $word = preg_replace($regexAY[$regex]["pattern"], $regexAY[$regex]["replacement"], $word);
                }
            }
            if ($word !== '') { array_push($newText, $word); }
        }

        return $this->text = implode(' ', $newText);
    }
}


$text = "     This _is_ _just_ _a test_
       *text*
       so     _don't_
       consider       it just   *read*
      it";
$a = new txtFormatting($text);
echo $a->upload()."\n";

$text = "This <i>is</i> <i>just</i> _a test_<br /> <b>text</b><br /> so <i>don't</i><br /> consider it just <b>read</b><br /> it";
$b = new txtFormatting($text);
echo $b->download()."\n";
share|improve this question
up vote 5 down vote accepted
  • The names of the methods should reflect what they are doing, i.e. you should call it something like encode (instead of upload) and decode (instead of download).

  • You should not store encoded information the database. Consider, for example, if you want to be able to support other methods of output in the future (e.g. into a PDF or whatever), or if you want to change * to be rendered as something else. Then you would have all this HTML inserted in your data that you have to decode. Instead, you should upload the original unencoded data into the table, return the unencoded data when needed for editing, and encode the data only just before you require it in the encoded format. This way there is also no need for a decode (download) method. During user input, you should only have to make sure the data is valid according to business rules.

  • Based on the example, it seems there is no <i> if _ is around multiple words? (Sorry, I'm no regex expert.)

share|improve this answer
    
your note about the _ is completely correct, and also applies for * – Vogel612 14 hours ago
    
Precious tips! I just didn't get what you mean with RAW data; or better, I already heard and read something about it but my researches didn't find anything usefull every time I did them. Can you briefly explain me what is it? (For the 3rd point: yes it's correct, at this stage I wanted my regex to allow text formatting just for single words) – brigo 13 hours ago
    
@brigo Basically, in general, one has the actual, unencoded business data, the one that users input and edit. This is the only data that should be in databases, and what I called raw data here (which was probably not the most accurate term). After later loading this data from the database, one would transform (i.e. encode, escape, truncate, add formatting etc) this data depending on in what context the data would be used. (Though, the data should of course be validated against business and integrity rules before being inserted in the table, i.e. make sure it's not empty etc) – JanErikGunnar 13 hours ago
    
@JanErikGunnar Ok now I understood what you mean with raw data: it's the user-provided data "untouched" by any modification. In this case it would be, for example "Hi, I'm user n. 43240 and I like ice creams"; right? Is it not basically the same saving the data with the "more standardized" HTML tags, so I also wouldn't need any decoding process when displaying the description to the other users and I would need it only in the less common case in which the user n.4320 wants to modify it? (I'm really interested in the DB management so I really apprecciate to know your point of view) – brigo 11 hours ago
    
I think there is still a little confusion :) Encode = replacing asterisks with HTML etc. You would encode when you need it to be HTML. When user is editing it, you send what is in the database without any encoding or decoding. You never decode it. – JanErikGunnar 11 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.