The Preg and Pattern Functions
Regular expressions can be used to modify an alter text. Being slower than basic string functions,
they should be used only for particular needs. This tutorial is not about regular expressions but
the functions that use them. However it is clear that in order to use the functions, you need to
understand the patterns that make them work. A regular expression can be as simple as the word cat.
Hence, if the sentence John has a cat is a string, cat becomes the substring the function will
alter. Unfortunately it is not alway this simple. If you want to find substrings that end with
cat, like wildcat, then cat$ is the regular expression you need. If you are searching for a
substring that begins with cat, like catholic, then the expression must be preceded by a caret
like this: ^cat. When it comes to regular expressions, this is probably as easy as it gets.
There are two types of characters in a regular expression: characters and metacharacters.
The characters are the ones that match themselves in a pattern, cat being a perfect example, and
the Metacharacters that represent other values. A dot is a perfect example of a metachracter.
It stands for anything except a newline, so c.t may mean cat, cut, cot, c1t, c2t, and so on.
An asterisk * following a character means "zero or more", so if you have ca*t, with the asterisk
following the a, the pattern may be cat, but it may also be ct, caat, caaat, etc. Other useful
methacharacters are: +, ?, {}, [], and (). If you want metacharacters to match themselves, you
must precede them with \.
Note: in order to understand this tutorial, it is recommendable that you develop a clear
concept of what regular expressions are. The following links can help you:
http://weblogtoolscollection.com/regex/regex.php?page=8
href="http://pa2.php.net/manual/en/reference.pcre.pattern.syntax.php.
The preg_grep Function
We begin with the function preg_grep. Its syntax is: preg_grep (pattern, input, flags (optional)). If you are familiar with Linux, you probably know that grep is a command that allows you to search one or more files for character patterns. Borrowing from the book Unix Shell Programming, we can get some useful information about grep. Its format is grep pattern files, similar to preg_grep's, but without parentheses. Let's say you have a list of horror movies, and you want to find out how many movies in that list have the word zombie in it. This is how you do it:
If you are a zombie-fan, grep will output many lines, but maybe you think zombies are repulsive, so grep only prints out two lines:
Zombie Creeping Flesh
Zombie Holocaust
Unfortunately searching for patterns isn't always this easy. What if the word Zombie is in the list, but there are no caps? Then grep would be unable to find it. Lucky for you, fixing this isn't much of a problem:
Or better yet grep -i 'zombie' movie_list.txt, where -i tells Linux that the search is not cap sensitive. Taking more information from the book Unix Shell Programming, we see even better examples of grep at work:
grep '[A-Z]' list --> gets lines of data containing a capital letter. grep '[0-9]' list ---> gets lines of data containing a number. grep '\.pic$' list ---> gets lines from list that end in .pic. In this case, the \ makes sure grep considers the dot part of the pattern the must be matched (a dot without \ means "any character). The $ symbol tells grep the pattern must end in .pic.
Since this is not a Linux tutorial, let's return to preg_grep. Examine the following script:
$names = array('Robert', 'Jimmy', 'Albert', 'Richard', 'Kimberly', 'Cindy', 'Ziyi');
$grep_array = preg_grep('/^[a-k]/i', $names);
foreach ($grep_array as $output_this)
{
echo "The output is ".$output_this."<br />";
}
/*
The output is Jimmy
The output is Albert
The output is Kimberly
The output is Cindy
*/
?>
We have an array of seven names, but we only need those that start with letters going from a to k. What do we do? We create a pattern: ^[a-k] similar to the ones used by Linux's grep. Remember when I said Linux's grep could become cap insensitive if you added a -i to the syntax? Well, here we have two / / enclosing the pattern and i follows the last /, telling preg_grep to ignore caps or the lack thereof. Our i becomes a pattern modifier, and, if you want a detailed information on pattern modifiers, read the article on this link:
http://pa2.php.net/manual/en/reference.pcre.pattern.modifiers.php
Go to Page 2 >> of this tutorial.



