Lecture Nine

Regular Expressions

Regular Expressions

  • many Unix utilities use regular expressions: grep, sed, awk, vi, perl, Tcl
  • shell filename matches are not regular expressions (eg. *.c)
  • examples in this lecture will use the grep utility and the file cars
  • regular expressions are used to search for or match text:
    • literal text can be used to search for that text
      • grep "chevy" cars
    • . matches any character (similar to ? wildcard)
      • grep ".c" cars
      • grep "5..." cars
    • [ ] matches any character within the square brackets (similar to [ ] wildcard)
      • grep "[cC]hevy" cars
      • grep "[0-9][0-9][0-9][0-9][0-9]" cars
    • matches any character not within the square brackets (similar to [! ] wildcard)
      • grep "ford" cars
    • ^ matches beginning of line
      • grep "^f" cars
    • $ matches end of line
      • grep " [0-9][0-9][0-9]$" cars
    • * following any character denotes zero or more occurrences of that character
      • grep "ford.*83" cars
      • grep "^ 65" cars
    • \ inhibits meaning of special characters
      • grep ' [0-9][0-9][0-9]\/tt> cars
  • extended regular expressions are not recognized by grep, can use egrep or grep -E:
    • ( reg-exp ) parentheses used for grouping
      • egrep "^( + +){2}65" cars
      • egrep "ford" cars
    • | means OR, matches reg-exp on either side of the vertical bar
      • egrep "ford|chevy" cars
  • regular expression characters may or may not need to be escaped - varies from program to program
    • eg. egrep and awk use ( and ) for grouping, sed uses ( and ) unless -r option is used
  • regular expressions may or may not need delimiters - varies from program to program
    • eg. grep and egrep don't use delimiters,sed and awk use delimiters, eg. /string/
  • other examples of regular expressions
    (Mr|Mrs) Smith - match either "Mr Smith" or "Mrs Smith"
    [a-zA-Z]+ - match one or more letters
    ^[a-zA-Z]*$ - match lines with only letters
    0-9+ - match string not containing digits
    [+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)([eE][+-]?[0-9]+)? - match valid "C" programming numbers

grep

  • uses regular expression for pattern, eg. grep 'reg-exp' filename, then prints matched lines
  • gives 0 exit status if pattern matched
  • options:
    • -c - counts matched lines instead of printing them
    • -i - ignores case
    • -n - precedes each line with a line number
    • -v - reverses sense of test, eg. finds lines not matching pattern
  • examples, using the file cars
    • grep 'chevy' cars - display only lines containing the string "chevy"
    • grep -c 'chevy' cars - display count of lines containing the string "chevy"
    • grep -i 'chevy' cars - display only lines containing the string "chevy", ignoring case
    • grep -ic 'chevy' cars - display count of lines containing the string "chevy", ignoring case
    • grep -v 'chevy' cars - display only lines not containing the string "chevy"
    • grep -ivc 'chevy' cars - display count of lines not containing the string "chevy", ignoring case
    • grep -n 'chevy' cars - display only lines containing the string "chevy", with line numbers