7 May 2012 01:32
grep and options that print matches
Joerg Sonnenberger <joerg <at> britannica.bec.de>
2012-05-06 23:32:33 GMT
2012-05-06 23:32:33 GMT
Hi all,
while looking again at the BSD grep code, I stumbled over the
inconsistent behavior of GNU grep when there are overlapping matches.
If either --color or --only-matching are specified, it becomes important
to decide what part of a line matches and when using more than one
expression, in which order. To give a few examples to show this issue:
echo abcde | grep -o -e 'ab' -e 'cde'
This prints two lines, "ab" and "cde". This is expected behavior.
echo abcde | grep -o -e 'abc' -e 'cde'
This prints one line, "abc". IMO this is wrong -- the second pattern
certainly matches the input line and should get output.
echo abcdeabc | grep -o -e 'ab' -e 'cde'
This prints three lines. The newer grep versions justify this by a
change in the man page (-o prints each match on a separate line). It
doesn't exactly explain the order ("ab", "cde", "ab") though. I would
consider "ab", "ab", "cde" as output quite a bit more logical.
echo abc | grep -o -e '..'
This prints one line, "ab". This means the match is greedy, even though
it is documented nowhere.
echo abcd | grep -o -e '..' -e '.*'
This prints one line, "abcd". So the longest match wins.
(Continue reading)
RSS Feed