0
\$\begingroup\$

This question is about how best to encode, send and store information between JavaScript, PHP, and MySQL

I am doing a GET request from HTML/JavaScript to my PHP server, which is then storing the data in mySQL. My solution in the one case I've been using it in, and I think it should work in all cases (If it won't, please let me know?)

I got tripped up for a bit by difference between php and javascript uri/urlencoding. JavaScript encodes spaces as "%20"s, but php encodes spaces with "+". So now what I am doing is converting all spaces to pluses before I uriEncode in JavaScript

JavaScript:

docTitleInput.value = encodeURI(document.title.replace(/\s/g,"+"));

PHP: In PHP I am directly storing the encoded value in MySQL

For instance, the string: '"Problems" in myApp' is stored in MySQL as

%22Problems%22+in+myApp

I thought this might be better than storing the decoded string, as quotes and other special characters can make db querying more difficult, in my experience. On the other hand, it makes data in the db harder to read... Am I wrong in this thinking?

Somewhere else I am looking up the hash (in another column) by sending this encoded string to PHP and use PHP to query the DB with exactly what's sent from JavaScript.

$stmt = $pdo->prepare("select hash from b where c=? and d=?");

And executed where d is %22Problems%22+in+myApp

Is this the best way to do what I am doing? Am I going to come to regret storing document titles in their encoded form?

Anyone with experience in these kinds of things, I'd appreciate your insights

\$\endgroup\$
1
  • \$\begingroup\$ I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. The example code that you have posted is not reviewable in this form because it leaves us guessing at your intentions. Unlike Stack Overflow, Code Review needs to look at concrete code in a real context. Please see Why is hypothetical example code off-topic for CR? \$\endgroup\$ Commented Feb 24, 2022 at 19:56

3 Answers 3

3
\$\begingroup\$

Yeah, I have a recommendation that I think will save you a few headaches.

Understand what the unencoded string is: where a space is just a space (not a %20 or a +). And importantly, an ampersand is an ampersand (not &&), a 💩 is a 💩 (not that various encodings of 💩 or "\uD83D\uDCA9" or \01f4a9 or %F0%9F%92%A9), etc. When in a programming language, don't carry around an encoding for a different medium-- it gets confusing too fast. PHP and Javascript strings should be able to hold "unencoded" unicode strings, but calling this "unencoded" is deceptive, as they are simply encoded in a specific way, for that language. You could say, "javascript encoded" or "php encoded". If you do that, you see why it's then confusing to say "javascript encoded and then url query parameter encoded".

(A tangential point: URL encodings are hard because different pieces of a URL have slightly different encodings.)

Once you have your strings in a "native" format for a given programming language, then it becomes clear that you need to encode it appropriately for that medium as you use it. To build a URL, you'll need to use the appropriate function to encode a given piece of the URL. Or, for the database, use the database driver encoding functions, which do the proper thing like escaping quotes and getting unicode in the expected way. Likewise, there will be corresponding tools to decode the pieces when received. Do this as they are marshalled into Javascript or PHP.

In this way you are just dealing with one translation of a string at a time. If you later then need to query the database for a given string, it's straightforward how to do this. And, if you're outputting it onto a web page, it's clear that you need to encode the <, >, and sometimes &.

Honestly, you're not saving yourself any effort by carrying around URL encoded strings in the database. Also, if you keep it simply like this, when something goes wrong, it's pretty easy to diagnose. With layers of encoding it can be a complex logic puzzle. I hope this helps!

\$\endgroup\$
3
\$\begingroup\$

Here are some tips:

  1. Refrain from pre-encoding data before you store it in a database. You mentioned you're doing this because it makes queries easier - I don't know if you're referring to programmatic querying or writting one-off custom quering, but I'll address both of these points.
  • Programatic quering: The $pdo->prepare() function you're using will handle special characters in a perfectly safe way. Just make sure you always stick to using these sorts of query-construction functions (as apposed to building queries via string interpolation), and you should always be safe from XSS attacks and what-not.
  • one-off queries: It's the creators of SQL's job to provide a user-friendly way to write queries with special characters. If you're choosing to percent-encode data just to make it easier to write these queries, you're basically saying that SQL didn't do a great job at this, and you want to replace their solution with your own. This line of thinking isn't bad at all (often the tools we use don't do the best at handling whatever use cases we have), but it should raise a warning flag and cause us to think twice before continuing. In this case, I would recommend just storing the data unencoded, and getting comfortable with writting queries that use special characters.
  1. URLSearchParams() is actually intended for encoding other parts of a URL, like the path. Query-parameters are generally form-encoded, which follows an older version of the percent-encoding standard. If you want to properly encode query parameters, use JavaScript's URLSearchParam's API instead, e.g. new URLSearchParams({ param: 'this has spaces' }).toString() === 'param=this+has+spaces'. See here to learn more about these two different versions of percent encoding, and here to learn how to properly encode query parameters in JavaScript.
\$\endgroup\$
1
\$\begingroup\$

PHP gives you several tools for encoding/decoding, but fails to say when to use each. Here's a crude list:

urlencode - only for building the key or value of a URL string
htmlentities - only for building HTML that might have 8-bit codes
mysqli_real_escape_string or addslashes
     only when building MySQL SQL without using "binding"

Do not mix any of the above in the same expression.
Do not use any of the matching 'decoders'.

Semi-related: These are the only encode/decode routines that I have found use for:

json_encode(..., JSON_UNESCAPED_UNICODE) and json_decode(...)
serialize and unserialize -- useful with SESSIONs

Do not use any mb* functions. (I invite anyone to present a use for them.)

\$\endgroup\$
2
  • 3
    \$\begingroup\$ Despite the fact that I have enormous respect for your proven expertise across the Stack Exchange Network and your personal website offerings. I am going to need more concrete support for these advices if I am to get on board. Perhaps that hardest is Do not use any mb* functions . Please qualify why you are giving this advice. Please edit this answer. P.S. I know what you mean with "Do not [...] other [...] other [...]", but it reads a little weird. Can you rephrase? \$\endgroup\$ Commented Feb 14, 2022 at 22:36
  • 1
    \$\begingroup\$ @mickmackusa - I admit that I was going out on a limb to say all of these things. I would not be surprised for someone to insist on the need for something that I said not to use; I will welcome that discussion. I do not have anything any "concrete". I have been chasing character set issues since MySQL expanded past "latin1" nearly 2 decades ago. I look at the "mb" functions periodically and realize that they are "not appropriate" for whatever I am chasing. \$\endgroup\$ Commented Feb 15, 2022 at 2:46

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.