November 26, 2003

Powers of Regex() in C#

Well today has been a good coding day. I just refactored a section of my code from just over 900 lines to that of just under 100, thanks to C#'s Regex() class. I am so bloody proud of that I just had to tell the world.

Well more to the point, I wanted to blog it so others can learn from my mistakes.

I won't go directly into my code, but will quickly state that it was 900 lines of lexical analyzer goodness... or badness.. depending on who is reading it. I was reading in a custom configuration file and trying to break it down into smaller tokens so I can deal with it. The parser was huge... mostly because this code was ported from C where I HEAVILY used pointers to deal with the slide and compare routines.

Now enough about my code... and onto why C# Regex() rocks. I don't need to discuss WHY Regex() is important (I have done that before, and you got that you need to treat all input as malcious until validated otherwise RIGHT???).. but I want to teach you about a neat little feature that makes it a $DEITY send. It's called named groupings. With it, when the regular expressions are ran through... it will take the named group construct and capture substrings if and when they match. What is nice about this approach is that you can use it find an exact pattern match, and then break it down into its child substrings directly without having to parse it out. Pretty sweet if you ask me! Everything is stored in the resulting Match.Groups[] array, which can be queried by passing the named group. The construction of a named group is quite easy.... its just (?<named_group>expression).

Let me show you a simple example of how to use this. Lets say you want to parse out a simple line that holds a string value, and then a numeric ulong value which is in circle brackets. ie: foo(1)

Here is how you would do that:


string line = "foo(1)";
// I need to escape the brackets here
Regex pattern = new Regex( @"^(?<my_str>\S+)\(?<my_ulong>\d\)$", RegexOptions.IgnoreCase );
Match match = pattern.Match( line );
if( match.Success )
{
string parsed_str = match.Groups["my_str"].Value;
ulong parsed_num = ulong.Parse( match.Groups["my_ulong"].Value );
}

Ya ya ya... I should be catching the Parse() exception... but that was not added for clarity. You get the idea though. Within a few lines you got properly validated data directly through the Regex()!

Now... lets make this even easier. I found a sweet tool from Rad Software that is perfect for building these regular expressions with named groupings called RegEx Designer and its FREE! It allows you to quickly test different regex and see the results immediately. Thanks for the tool guys!

All and all this has made my day. Any time you can reduce the amount of code and thus reduce the potential bug surface... you are having a great day. Especially when I shrunk it by a factor of 9 times! And its quite easy to review and manage... which makes it all the more interesting.

So if you haven't had a chance to check it out... give it a try. Regex() and "named groupings"... a great combination!

Posted by SilverStr at November 26, 2003 03:59 PM | TrackBack
Comments

interesting, on many levels... I was googling around looking for some Regex examples in c# and found this log... interesting in that its pretty much the regex feature i was curious about and this log was entered on my birthday... so thanks for the present

Posted by: Jeff at December 21, 2003 09:43 AM

Hey, I'm glad it was useful to you! And happy belated birthday! :)

Posted by: SilverStr at December 21, 2003 11:09 AM

Great little article :)
I was googling for Regex in C# as well.
This helped. Thanks!
:)

Posted by: Ron at October 24, 2004 12:45 PM