Exact Pattern Matching

Grep is great for basic pattern matching where you are looking for an exact pattern.

For example, to all lines in an nginx log file that are for a specific IP, you might do:

grep '' *.log

Complex Pattern Matching

The default regex syntax used in grep is not very advanced. If you are used to PHP regex then things get a lot easier if you use grep with the -P flag, so that it uses Perl Compatible Regular Expressions - which is exactly what PHP uses.

grep -P '/[^/]+?\.js' *.log

Pulling out Subpatterns

If you want to pull out a specific match rather than echo out the whole matching line, then the best way to do this is using lookbehind and lookahead, combined with the -o flag

For example:

netstat -tulpn | grep 1080 | grep -Po '(?<=[^0-9])([0-9]+?)(?=/ssh)'

This command gives me the PID for the process that is running on port 1080, in my case a local socks5 proxy

Note the above example also demonstrates using grep in a pipe.

Look Ahead and Behind

You might want to have matches that enforce a particular patter before or after your target pattern. For this we can use lookahead and behind

  • Lookbehind (?<=...)
  • Lookahead (?=...)

Discared Patterns

If you want to match a pattern, but not actually capture the pattern then you can use (?:...)

These patterns will not be included in your match, but will be enforced as matching.

Counting Matches

Lets say someone is attacking your server and you want to parse exception logs to pull out teh number of exceptions by IP

This is one way to do that:

grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' exception.log | sort | uniq -c

If you then want to filter out IPs making more than 1000 exceptions, you could do:

grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' exception.log | sort | uniq -c | grep -pO '(?:[0-9]{4,} )(.*)'