Questions from the Field: Should I Escape My Input, And If So, How?
At last weekend's PHP Benelux I gave a tutorial titled "From SQL to NoSQL". Large parts of the tutorial covered using MongoDB—how to use it from PHP, schema design, etc. I ran a little short of time, and since then I've been getting some questions. One of them being: "Should I escape my input, and if so, how?". Instead of trying to cram my answer in 140 characters on Twitter, I thought it'd be wise to reply with this blog post.
The short answer is: yes, you do need to escape.
The longer answer is a bit more complicated.
Unlike with SQL, inserting, updating and deleting data, as well as querying data, does not require the creation of strings in MongoDB. All data is always used as a variable or a constant. Take for example:
<?php $c = (new MongoClient())->demo->col; $c->insert( [ 'name' => $_GET['name'] ] ); ?>
Because we don't need to create a string with the full insert statement, there is no need to escape with ' to prevent issues like SQL injections. The context in which variables are used is immediately clear.
But be aware that PHP's request parameters (GET, POST, COOKIE, and others) allow you to send not only scalar values, but also arrays. If we take the example code from above in mind, and request the URL http://localhost/script.php?name[first]=Derick&name[last]=Rethans, we end up inserting the following document into the collection:
[ 'name' => [
'first' => 'Derick',
'last' => 'Rethans'
] ]
And this is probably not what you had in mind.
The same trick is possible when doing queries. Look at this code:
<?php
$c = (new MongoClient())->demo->col;
$r = $c->findOne( [
'user_id' => $_GET['uid'],
'password' => $_GET['password']
] );
?>
If we now would request the URL http://localhost/script.php?uid=3&password[$neq]=foo we end up doing the following query:
<?php
$c = (new MongoClient())->demo->col;
$r = $c->findOne( [
'user_id' => '3',
'password' => [ '$neq' => 'foo' ]
] );
?>
The password clause in that query, will likely always match. Of course, if you are not storing passwords as a hash, you have other problems too! This is just a simple example to illustrate the problem.
This same example highlights the second issue - that is that all request parameters are always represented by strings in PHP. Hence my use of '3' instead of 3 in the above example. MongoDB treats '3' and 3 differently while matching, and searching for 'user_id' => '3' will not find documents where 3 is stored as a number. I wrote more extensively about that before.
So although MongoDB's query language does not require you to build strings, and hence "escape" input, it is required that you either make sure that the data is of the correct data type. For example you can do:
<?php
$c = (new MongoClient())->demo->col;
$r = $c->findOne( [
'user_id' => (int) $_GET['uid'],
'password' => (string) $_GET['password']
] );
?>
For scalar values, often a cast like I've done above, is the easiest, but you might end up converting an array to the string 'Array' or the number 1.
In most cases, it means that if you want to do things right, you will need to check the data types of GET/POST/COOKIE parameters, and cast, convert, or bail out as appropriate.
Life Line
Updated a restaurant; Confirmed a hotel
Paraphrasing opening keynote speaker at ConFoo: "Should we go back to the waterfall method of writing massive specs upfront to feed to AI coding agents?"
Updated a pub
Merged pull request #1065
Comparison whether class is userland or internal used the wrong macro
PHP 8.6: zend_enum.h now mixes code with declarations
PHP 8.6: Argument names are now stored as zend_strings
Updated a bench and a waste_basket
I walked 8.3km in 1h25m37s
Created a recycling
I walked 10.5km in 1h46m57s
An interesting journey in story form, showing how English changed over time.
https://www.deadlanguagesociety.com/p/how-far-back-in-time-understand-english
A much better writer than I is summing up perfectly why I have such disdain for Generative AI/LLMs.
https://jonn.substack.com/p/so-why-do-i-feel-so-angry-about-this
Created a waste_basket; Updated a waste_basket; Deleted a bench
Created a bench; Updated 7 benches and a gate; Deleted 2 benches and a gate
Created 10 benches and 2 waste_baskets; Updated an information
Created a bench; Updated 2 benches; Deleted a bench
I hiked 18.1km in 3h17m10s
I walked 3.0km in 25m12s
Updated a restaurant
I walked 4.6km in 34m18s
I walked 0.7km in 5m33s
Updated a restaurant



Shortlink
This article has a short URL available: https://drck.me/escinput-bm4