August 17, 2006

Leveraging powerful data validation in SQL Server 2005

I am so pleased with deciding to go with SQL Server 2005 Express for a recent project. I recently learned about an extremely powerful feature that makes data validation in the database a breeze.

Whenever you use input from an untrusted source, it needs to be validated. Especially if it comes from or can be accessed by the user. The best way to handle this is to put an input sentry at any trust boundary, as it crosses from an untrusted to trusted border. Ultimately, the last line of defense will be the database, as that is where the final storage ends up... at least for our application.

You can easily apply CHECK constraints on fields in the database. But that is a very rudimentary method of validating the input, since you can typically only do basic checks.

Enter the fact that in SQL Server 2005, you can now enable CLR in the database, and write user-based functions in your favorite .NET language. And more importantly, you can CALL these functions AS constraints on fields in the database.

This is really impressive stuff. In my case, I wrote a generic regular expression validation function that allows me to do the deepest of validation checks on the data before its inserted. If the data fails the regex validation, the record will not be committed.

I decided to screencast the authoring of this powerful regular expression validation method. Feel free to use it yourself on your SQL Server 2005 databases. You can view the screencast here.

And remember... always assume that input is malicious until proven to be safe.

Posted by SilverStr at August 17, 2006 11:25 PM | TrackBack
Comments

I never understood why regular expressions weren't included as a native feature of T-SQL. They're so powerful for this kind of thing.

And in fact, there are generally very few (valid) reasons OTHER THAN regexs to want to SQLCLR anyway.

Posted by: Dean Harding at August 17, 2006 11:50 PM

It sounds like an interesting feature, but surely you shouldn't be allowing direct access to the database from an untrusted source? Imho you should have an abstraction layer (e.g. .NET web service?) above the database which does all of the checks and validations. What happens if you want to move to an Oracle database when your database gets big, or how about clustering?

Also, what's the performance implication of this?

I guess my question is: 'At what point should you use a SQL Server user defined function over doing it in your code ahead of persistance?'
I suppose that all comes down to whether you want business logic (?) in your data layer?

Posted by: Ed at August 20, 2006 07:10 AM

Hey Ed,

No, you wouldn't give direct access to an untrusted source. However, that doesn't mean you don't defend against the possibility that data may be tainted, even from a PERCEIVED trusted source. As an example, an ASP.NET app may be given access to the database. Theory is you should do data validation there. But what if you forget or miss it? If you have the ability to treat even your trusted app as untrusted, you can better defend against attack vectors you don't know about yet.

If a need to move to a larger db exists, I will logically go to SQL Server 2005... which could easily handle the load. Oracle doesn't support such useful mechanisms.

As for performance implications, I wouldn't run such code on records and queries that occur frequently. However, in my case I applied the regex on INSERT queries on items such as Users and Tokens which happens quite infrequently. Very little performance issues at that point.

I don't believe that data validation is a business logic issue. I believe its a data issue... and should be in the data layer.

Excellent feedback. Thanks!

Posted by: Dana Epp at August 20, 2006 04:30 PM