Search this blog

You can search for all the topics in our blog using our google custom search now.

Search the blog

Loading

Wednesday, June 30, 2010

Learning regular expression step by step tutorials: Step 3

Free Microsoft MCTS 70-536 Examination Training and Preparation


If you haven't watched the previous post, then check out the related topics

Character Classes or Sets

If you want to tell the regex engine to match with one of the several characters, then you can form Character Classes/Sets. To form a character set, we just need to place the characters in square brackets like this : [a-z], this will include all the characters from a to z
or like this : [ghi], this will include characters specified in the square bracket(g,h and i)

Negated Character Classes

If you want to tell the regex  not to match with any of the given range/set of characters, then you can use negated character classes. To form a Negated Character Class you need to add ^(A Caret sign) after opening the square bracket. Some of the examples are:

[^ghi] character class will match with any character other than g,h and i.
[^a-z] character class will match with any characters other than between a to z.

Related Topics

Step 1 Learning regular expressions,


Step 2 Learning regular expressions

Short hand Character Classes


In previous post we have seen that when you use "backslash" with literals other than special characters because "backslash" in combination with other literals creates a regex token which has a special meaning in itself.
For example:
when "d " is used in combination with "/", it creates a regex token "/d" which matches all digits from 0-9.



The above regex token is an example of Short-hand Character Class. A short hand character class has been developed to include character classes which are used more often. Other Examples are:

"\w" stands for word character which can be defined as [a-zA-Z0-9_]
"\s" stands for white space character which can be defined as [ \t\r\n](space, tab, line break)


Repeating Character Classes 

You can repeat a character class using ?,* or + operators. If you use "+" operator with a character class like this: [a-c]+ 

The above expressions tells the regex engine to match with any characters between a,b,c one or more times. Which means [a-c]+ matches with jukibyha since input(jukibyha) has "b" and "a".

However most of the times  you want to check if there are any combinations of "aaa","aba","bbc","baa","cca" etc patterns in the given input. To do that we need to use backreferences to check for repeated matched characters. If you didn't understand what i mean by repeated matched characters, then here is the following explanation:

If my regular expression is [0-3] and the input is 568935413, then  our regex engine finds the first match "3" and then stops even though "1" is also a valid match. So now "3" is a Matched character by the regex engine.
If you want regex engine to keep searching for matched character "3" then you use backreferences to tell regex engine to save the matched characters in its memory.If you use back references, regex engine will find the first match "3", save it in the memory and it will keep searching for another "3" until end of the given input.

So if you want to look for repeated matched characters, you can create a regular expression like this
([0-3])\1+.This will tell the regex engine to look for repeated matched characters for one or more time.

 


 
Check for next tutorial on regular expressions
Regards
Sameer Shaik
Free Microsoft MCTS 70-536 Examination Training and Preparation

No comments:

Post a Comment