Table of Contents

Regular Expressions


Links & references:


Introduction

Regexp (regular expression) meta-characters

     "1133*" matches 11 + one or more 3's + possibly other characters:
     113, 1133, 111312, and so forth.                                 
     "13." matches 13 plus at least one of any character (including a
     space): 1133, 11333, but not 13 (additional character missing).
     ".*" matches any number of any characters.
     "^$" matches blank lines.                                       
       "[xyz]" matches the characters x, y, or z.

       "[c-n]" matches any of the characters in the range c to n.

       "[B-Pk-y]" matches any of the characters in the ranges B to P and k to y

       "[a-z0-9]" matches any lowercase letter or any digit.

       "[^b-d]" matches all characters except those in the range b to d.
                (This is an instance of ^ negating or inverting the meaning of
                the following regexp, taking on a role similar to ! in a different
                context.)

       Combined sequences of bracketed characters match common word
       patterns.

       "[Yy][Ee][Ss]" matches yes, Yes, YES, yEs, and so forth.

       "[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]" matches any
       Social Security number.                                      
     A "\$" reverts back to its literal meaning of "$", rather than its
     regexp meaning of end-of-line. Likewise a "\\" has the literal meaning
     of "\".
       "him|her" matches "it belongs to him" and "it belongs to her"

       "(Memo|Report)20.\.txt" matches Memo201.txt, Report20a.txt, and
       Report209.txt; note use of grouping ().  Certain applications
       require the parens () to be escaped:  \( and \)

       $ w | grep "jchung\|clayton" # Note the "\|" in the grep regexp.

Extended regular expressions

     im?ing matches swiing, swiming, but not swimming
     9+ matches 9, 99, 999, but not 88
       A[0-9]{3} matches "A" followed by exactly 3 digits (A123, A1234
                 but not A12 34).

       [0-9]{4,6} matches any sequence of 4, 5 or 6 digits

Simple regexp examples using the %s (search and replace) command in vi

    :%s/  */ /g          Change 1 or more spaces into a single space.
    :%s/ *$//            Remove all spaces from the end of the line.
    :%s/^/ /             Insert a space at the beginning of every line.
    :%s/^[0-9][0-9]* //  Remove all numbers at the beginning of a line.
    :%s/b[aeio]g/bug/g   Change all occurences of bag, beg, big, and bog, to
                         bug.

Medium regexp example using search and replace in vi

 Before                   After
 ------                   -----
 foo(10,7,2)              foo(7,10,2)
 foo(x+13,y-2,10)         foo(y-2,x+13,10)
 foo(bar(8),x+y+z,5)      foo(x+y+z,bar(8),5)

 The following substitution command will do the trick:

 :%s/foo(\([^,]*\),\([^,]*\),\([^)]*\))/foo(\2,\1,\3)/g

 [^,]  means any character which is not a comma.

 [^,]*  means 0 or more characters which are not commas.

 \([^,]*\)  using grouping \( )\, tags the non-comma characters as \1 for use
 in the replacement part of the command.

 \([^,]*\),  means that we must match 0 or more non-comma characters
 which are followed by a comma. The non-comma characters are tagged.

 foo(\([^,]*\),  translates to "after you find foo(, tag all characters up to
 the next comma as \1".