Joerg Sonnenberger | 7 May 2012 01:32
Picon

grep and options that print matches

Hi all,
while looking again at the BSD grep code, I stumbled over the
inconsistent behavior of GNU grep when there are overlapping matches.
If either --color or --only-matching are specified, it becomes important
to decide what part of a line matches and when using more than one
expression, in which order. To give a few examples to show this issue:

echo abcde | grep -o -e 'ab' -e 'cde'

This prints two lines, "ab" and "cde". This is expected behavior.

echo abcde | grep -o -e 'abc' -e 'cde'

This prints one line, "abc". IMO this is wrong -- the second pattern
certainly matches the input line and should get output.

echo abcdeabc | grep -o -e 'ab' -e 'cde'

This prints three lines. The newer grep versions justify this by a
change in the man page (-o prints each match on a separate line). It
doesn't exactly explain the order ("ab", "cde", "ab") though. I would
consider "ab", "ab", "cde" as output quite a bit more logical.

echo abc | grep -o -e '..'

This prints one line, "ab". This means the match is greedy, even though
it is documented nowhere.

echo abcd | grep -o -e '..' -e '.*'
This prints one line, "abcd". So the longest match wins.
(Continue reading)


Gmane