Good discussion came up on IRC today, regarding sanitizing of the data.
I’d like to clear up a little confusion and hopefully set the record straight on the sanitizing of data.
First, we need to keep in mind that there are two types of “evil” data, which most web developers are worried about.
One can hurt your database and the other one that can hurt your HTML pages.
Therefore, it is generally advisable to protect your web application against:
1. SQL Injection
and
2. XSS, or cross-site scripting.
It is, however, important not to mix up the two and realize at which point it is appropriate to protect yourself (or your app) from harm.
First, the good news. With CakePHP you are automatically protected from SQL injection IF you use CakePHP’s ORM methods (such as find() and save()) and proper array notation as outlined in the manual. The bad news is that if you are playing around with plain SQL or do not adhere to proper CakePHP syntax of writing queries, you could still be vulnerable to an SQL injection attack.
To be a little more specific, CakePHP will quote and escape all fields and values in your queries, assuming you’re following the rules, to keep your DB nice and safe.
(If you care about more details, see here).
It is important to note, that updateAll() does not escape the fields, so be cautious when using it.
So, technically you have to worry very little about SQL injection when working with CakePHP.
Now, what about user inputting some evil scripts into your blog to hijack your site?
I think the most important point here is that any HTML script, tag or trick is perfectly safe to store in your database as is.
The basic rule is that you do not need to sanitize HTML data in any way before saving it into the database.
When you are concerned about XSS it is the output of data from the database that you need to worry about. If you display raw data from the DB, it is quite possible that you’ll have “evil scripts” injected into your page. With CakePHP it is easy to combat enough with a handy h() method.
Are there cases where you must sanitize HTML before saving it to the database?
Probably, but I can’t really think of any.
Even if your application requires it, you might want to create two fields in the relevant table raw_data and sanitized_data, as an example.
P.S. AD7six just proposed an excellent question, which he also answered just as well… Why allow script code to be stored in some user’s profile? The answer is that having this raw (although unwanted) data is useful for tracking and dealing with malicious users. That’s where having two fields in the table to store sanitized and raw data might come in very handy. If you remove the script tags prior to storing the user information, it’ll become much harder to find and flag users who are obviously trying to screw with your app.