spanish icon   Spanish version


english icon  English version

News

03-24-2009:PHP Regular Expressions tutorial added to the PHP tutorials section.


03-19-2009: Two old templates (Photoblue and Hardwarezip) added to the webpage.


03-01-2009:Preg and Pattern Functions tutorial added to the PHP tutorials section.


About Redacron:

We are the creators of:


Email Icon  Write to Us


Valid XHTML 1.0 Transitional


Links

Webmaster Toolkit Collection of webmaster tools developed to help webmasters with our daily webmaster chores.

Geobytes For the web developer, Geobytes content localization technology is 'rain maker' technology.


Posicionamiento buscadores Descripción: somos una empresa dedicada al posicionamiento en buscadores y marketing online. Ofrecemos atractivos paquetes de posicionamiento para mejorar la visibilidad de su empresa en internet.

 





Regular Expressions


Introduction
Basic Syntax
Four Basic Symbols
Brace Yourself
Or Operator
The Usefulness of Brackets
The Backlash in PHP
Regular Expressions Examples

 

Introduction

Regular Expressions might as well be called Nightmare Expressions, since mastering them is everything but easy. However programmers should keep in mind that trying to avoid the use of regular expressions may lead to chunks of code that more tedious to write than any regular expression.


 

Basic Syntax

If you are using Linux, the egrep command is a blessing for those who want to use regular expressions. If you aren't using Linux, you can still understand what's going on. Go to the Linux console, using touch create a document called band_names. Then add these names to the file:

Iced Earth
Brocas Helm
Judas Priest
Cauldron Born
Iron Maiden
Black Sabbath
Dokken
Blue Oyster Cult
Accept
Deep Purple
Nightwish
UDO

If you like hard rock, some of these names will sound familiar to you. But the purpose of this article is not testing your knowledge of rock music. Suppose you want to know if the heavy metal band Cauldron Born is in the list. This is what you type:

cat band_names | egrep -i 'cauldron'

The output will be: Cauldron Born

Here -i is a pattern modifier that tells egrep not to be case sensitive. If we typed, egrep -i 'born' instead, the output would be the same. Sometimes, however, we want to know if a line or expression begins with a certain word. Typing cat band_names | egrep -i '^born' yields no output, since the line begins with the word Cauldron, not Born. Anything that follows the special symbol '^' must be at the start of an expression or string. Therefore, cat band_names | egrep -i '^cauldron' will have Cauldron Born an output.

But what if we want to find a word or phrase at the end of a string or expression? The symbol '^' would be useless. What we now need is the symbol '$', like this: cat band_names | egrep -i 'priest$'

And the output will be the name of a legendary heavy metal band: Judas Priest.

Now we know that ^ and $ mark the starts and ends of strings. Using cat band_names | egrep -i '^deep purple$' will produce the name Deep Purple.

 

Four Basic Symbols

The symbols '.', '*', '+', and '?' are extremely useful regular expressions. We begin with the simplest one: the dot. The dot is the simplest regular expression character. It stands for 'any character'. Let's see a few examples using egrep:

egrep -i 'a.t' band_names --> outputs: Iced Earth

egrep -i '^i.o' band_names --> outputs: Iron Maiden

egrep -i 'e.m$' band_names --> outputs: Brocas Helm

egrep -i 'o..e' band_names --> outputs: Dokken

The symbol * is similar to '.' except that it means 'zero or more', and this meaning applies to the preceding characters. Hence the regular expression 'a*t' means there may be zero or more a's in the expression followed by a 't' (t, at, aat, aaat, aaaat...). If we have the expression 'ab*', anything from 'a' to 'abb' to 'abbbbbbbbbbbbbbbbb(even more b's)' can be a match. It is worth mentioning that '.*' stands for zero to several unknown characters.

egrep -i 'a.*t' band_names ---> outputs: Iced Earth, Judas Priest, Black Sabbath, Accept

The plus sign '+' is a little less forgiving than '*' since it means 'one or more'. That being the case 'ab+' can mean 'ab', and it can mean 'abbb(even more b's)', but it never means 'a'.

Finally we have ? The nature of this symbol in everyday writing should tell you, more or less, what it stands for in regular expression. The expression 'ab?', for example, means there's an a, and there might be a b.

 

Brace Yourself

Symbols like '*', '+', '.', and '?' can be very useful, but sometimes we may need more precision in what we are searching. We already know that 'ab+' can mean 'ab' as well as it can mean 'abbbbbbbb', but what if we are searching for a string that only has two b's? In cases like this, braces {} are very useful. In fact braces are a lot better than '*' and '+' since we can tell egrep how many times we want a character to repeat itself. Therefore, abb can be replaced by ab{2}, and ab* can be replaced by ab{0,}. Notice that there's no number after the comma, which means the a may or may not be followed by b's. You can be more specific than this and type 'ab{0,2} (a, ab, or abb) or ab{1,2} (ab or abb).

echo 'Abba is great' | egrep -i 'b{2}a' --> Output will be Abba is great

echo 'Abbba is great' | egrep -i 'b{2}a' --> There's no output, since there are three b's instead of two.

echo 'Abba is great' | egrep -i 'b{3}a' --> There's no output; there are two b's, and we need three.

We will encounter situations where we want to find a combination of characters repeating themselves. We already know that if we use something like ab+ or ab* only the letter b is affected. We can solve this problem using parentheses. Having (ab)+ means there's one or more repetitions of 'ab' in the string.


OR Operator

 

When it comes to PHP, || stands for the OR operator. In regular expressions, the symbol '|' performs the same function. If you want to find Judas Priest and Iron Maiden from the file called band_names, this is what you type:

egrep -i 'judas|maiden' band_names ---> outputs Judas Priest, Iron Maiden

egrep -i '^(ju){1}|(en)+$' band_names ---> outputs Judas Priest, Iron Maiden, Dokken. If you can't understand why, it'd be a good idea to re-read this tutorial before you proceed.

egrep -i 'b(r|l)' band_names --> Brocas Helm, Black Sabbath, Blue Oyster Cult
 

The Usefulness of Brackets

If you can understand everything so far, and you can understand the role of brackets in regular expressions, everything else will be easy to you. Brackets specify which characters are allowed in a single position of a string. Typing '[ok]' is no different than typing (o|k). Likewise typing [abc] is a replacement for (a|b|c). But brackets aren't just a replacement for '|'. The expression [a-c] is the same as [abc], and typing [a-z] will save you from the hassle of having to type (a|b|c|d|e|f...etc). The proper use of brackets can encompass several characters. For example, [a-zA-Z0-9] covers all the alphanumeric characters.

echo 'The cost is $55' | egrep '\$[0-9]' --> output is The cost is $55, since the string has the $ sign followed by at least one number. Notice the backlash '\' before the '$'. The backlash tells egrep to take the $ literally, not as part of a regular expression.

Go to Page 2 >>



Home

Services: Joomla/osCommerce | Search Engine Optimization | Logo Design | Web Design

Portfolio: Catalog Design | Logo and Banner Design | Banner and Header Portfolio | Web Design

Tutorials: All Tutorials



Copyright © 2008 Redacron Studios. Design by R.P Carbonell.