spanish icon   Spanish version


english icon  English version

News

03-24-2009:PHP Regular Expressions tutorial added to the PHP tutorials section.


03-19-2009: Two old templates (Photoblue and Hardwarezip) added to the webpage.


03-01-2009:Preg and Pattern Functions tutorial added to the PHP tutorials section.


About Redacron:

We are the creators of:


Email Icon  Write to Us


Valid XHTML 1.0 Transitional


Links

Webmaster Toolkit Collection of webmaster tools developed to help webmasters with our daily webmaster chores.

Geobytes For the web developer, Geobytes content localization technology is 'rain maker' technology.


 





Regular Expressions (part 2)

<< Return to Page 1

At the start of this tutorial, you learn that the symbol '^' marked the start of a string such that '^a[1-9]' means that the string must begin with an a followed by a number from 1 to 9. Well, inside brackets, '^' has a different meaning. That being the case, the expression '^a[^1-9]' means 'any string that begins with an a and is not followed by a number from 1 to 9. Let's go back to the hard rock band names we saw at the start of this tutorial and use bracket variations to see what happens:

cat band_names | egrep -i '^b[l]' --> outputs Black Sabbath , Blue Oyster Cult, since the expression means 'any string that begins with a b and is followed by an l.

cat band_names | egrep -i '^b[^l]' --> outputs Brocas Helm , since the expression means 'any string beginning with a b and not followed by an l.

cat band_names | egrep -i '^(b|a)[^l]' --> outputs Brocas Helm, Accept. Keep in mind that this expression wants a string that begins with either a b or an a, but it must not be followed by an l.

cat band_names | egrep -i '^(b|a)[^c-l]' --> outputs Brocas Helm. This expression wants a string that begins with either a b or an a, but it must not be followed by any letter in the alphabet that goes from c to l.

The backlashes '\' tell egrep to interpret the symbols that follow literally, so '\.' stands for just a dot, not 'any character'. When the string contains symbols used by regular expressions like '.', or '?', preceding the symbol with a backlask '\' solves the problem. If you have the advantage of using Linux as an operating system, type this line in your console:

echo 'http://www.cnn.com' | egrep 'http:\/\/w{3}\.cnn\.[a-z]{3}'

 

The Backlash in PHP

PHP has other uses for the backlash that make it a good alternative to brackets. We already know that [a-zA-Z0-9] stands for any alphanumeric character. If you see the expression 'www' you immediately conclude that egrep will only output a string contains three w's. However the expression '\w' means the same thing as [a-zA-Z0-9].

\w --> Word character, [a-zA-z0-9_].

\W --> Non-word character, [^a-zA-z0-9_].

\d --> Digit character, [0-9].

\D -->Non-digit character, [^0-9].

\s -> Whitespace character, [\n\r\f\t ].

\S --> Non-whitespace character, [^\n\r\f\t ]

You must keep in mind that \w, \d, \W, etc, will only work inside brackets. Using the function preg_match, with the regular expression [\s\w]+ will output any string that has spaces accompanied by letters and numbers. For example,

<?php
$msg = "This is good text";
preg_match('/[\s\w]+/', $msg, $matches);

foreach ($matches as $onematch)
{
echo $onematch; //output will be This is good text
}
?>

Now, if you change $msg to "This is good text (and this part must be included)" you will notice that This is good text continues to be the output. This is because neither \s nor \w, as you may already know, accept parentheses. Now, if you change the regular expression from [\s\w]+ to [\s\d]+, you won't get an output unless you change $msg to "(1) This is good text", where you will get an output of 1 because it is the only number in the string.

Feel free to experiment with the function preg_match, but keep in mind that it only outputs the first part of the string that matches the regular expression, meaning that if the regular expression is [\s\W]+ and $msg = "(1) This is good text", then ( will be the output, not ( ). If you want ( ) to be the output, then you need preg_match_all, which is more complex than its counterpart More on this on another tutorial: Preg Function

 

Regular Expressions Examples

Before we finish here, let's look at some useful Regular Expressions, starting with the simplest ones:

if (eregi("^((root)|(bin)|(daemon)|(adm)|(lp)|(sync)|(shutdown)| (halt)|(mail)|(news)|(uucp)|(operator)||(mysql)| (httpd)|(www)|(cvs)|(shell)|(ftp)|(irc)| (download)|(guest))$", $name)) return false;

Explanation: value of false will be returned if eregi equals true for any of the values inside parentheses. This is useful if you don't want a user to register those values as usernames and cause trouble.

'/^[^\W][a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)*\@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*\.[a-zA-Z]{2,6}$/'

Explanation: this one is very useful, for it allows you to identify an e-mail address as valid. Notice the ^[^\W]. It means the e-mail address cannot start with a non-word character. The expression (\.[a-zA-Z0-9_-]+) is followed by a * which means this is a sequence that may or may not appear in the e-mail address. Notice that inside the parentheses we have a \. which stands for a literal dot. This part of the expression allows email addrsses like john.doe@hotmail.com, redes.usa@yahoo.com to be valid. The expression ends with [a-zA-Z]{2,6}, and because of this you cannot have numbers instead of .com, as in johndoe@hotmail.77, which isn't exactly an e-mail address. Also, you cannot have johndoe@hotmail.c, which wouldn't be legal either.

preg_match('/\(\(([a-zA-z0-9_ ]{1,})\)\)/', $msg, $matches);

Explanation: not exactly a regular expression you will see very often. Due to the \(\( at the start of the expression, and the \)\) at the end, we know that only a string like $msg = "((Here we are))" would be a match, since it begins with (( and ends with )).

preg_match('/http:\/\/www\.[\w]+\.com/', $msg, $matches);

This one isn't rocket science. Matches will be website address like http://www.google.com, http://www.yahoo.com, and adresses like www.google.com or http://google.com will be ignored.

For more information, visit the following pages:

PHP_regular_expressions_examples
Sitepoint's PHP_regular_expressions_examples

<< Return to Page 1



Home

Services: Joomla/osCommerce | Search Engine Optimization | Logo Design | Web Design

Portfolio: Catalog Design | Logo and Banner Design | Banner and Header Portfolio | Web Design

Tutorials: All Tutorials



Copyright © 2008 Redacron Studios. Design by R.P Carbonell.