Michal Privoznik | 6 Mar 11:59 2012
Picon

[PATCH] config: Introduce --patience config variable

Some users like the patience diff more than the bare diff. However,
specifying the '--patience' argument every time diff is to be used
is impractical. Moreover, creating an alias doesn't play with other
tools nice, e.g. git-show; Therefore we need a configuration variable
to turn this on/off across whole git tools.
---
Please keep me CC'ed as I am not signed into list.

 Documentation/diff-config.txt |    3 +++
 diff.c                        |    9 +++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 1aed79e..b72b2fd 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
 <at>  <at>  -86,6 +86,9  <at>  <at>  diff.mnemonicprefix::
 diff.noprefix::
 	If set, 'git diff' does not show any source or destination prefix.

+diff.patience:
+    If set, 'git diff' will use patience algorithm.
+
 diff.renameLimit::
 	The number of files to consider when performing the copy/rename
 	detection; equivalent to the 'git diff' option '-l'.
diff --git a/diff.c b/diff.c
index a1c06b5..8940d00 100644
--- a/diff.c
+++ b/diff.c
(Continue reading)

Jeff King | 6 Mar 12:49 2012
Picon

Re: [PATCH] config: Introduce --patience config variable

On Tue, Mar 06, 2012 at 11:59:42AM +0100, Michal Privoznik wrote:

> Some users like the patience diff more than the bare diff. However,
> specifying the '--patience' argument every time diff is to be used
> is impractical. Moreover, creating an alias doesn't play with other
> tools nice, e.g. git-show; Therefore we need a configuration variable
> to turn this on/off across whole git tools.

The idea of turning on patience diff via config makes sense to me, and
it is not a problem for plumbing tools to have patience diff on, since
patience diffs are syntactically identical to non-patience diffs. So I
think the intent is good.

A few comments on the patch itself:

> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
>  <at>  <at>  -86,6 +86,9  <at>  <at>  diff.mnemonicprefix::
>  diff.noprefix::
>  	If set, 'git diff' does not show any source or destination prefix.
>  
> +diff.patience:
> +    If set, 'git diff' will use patience algorithm.
> +

Should this be a boolean? Or should we actually have a diff.algorithm
option where you specify the algorithm you want (e.g., "diff.algorithm =
patience")? That would free us up later to more easily add new values.

In particular, I am thinking about --minimal. It is mutually exclusive
(Continue reading)

Thomas Rast | 6 Mar 14:01 2012
Picon

Re: [PATCH] config: Introduce --patience config variable

Jeff King <peff <at> peff.net> writes:

> On Tue, Mar 06, 2012 at 11:59:42AM +0100, Michal Privoznik wrote:
>
>> --- a/Documentation/diff-config.txt
>> +++ b/Documentation/diff-config.txt
>>  <at>  <at>  -86,6 +86,9  <at>  <at>  diff.mnemonicprefix::
>>  diff.noprefix::
>>  	If set, 'git diff' does not show any source or destination prefix.
>>  
>> +diff.patience:
>> +    If set, 'git diff' will use patience algorithm.
>> +
>
> Should this be a boolean? Or should we actually have a diff.algorithm
> option where you specify the algorithm you want (e.g., "diff.algorithm =
> patience")? That would free us up later to more easily add new values.
>
> In particular, I am thinking about --minimal. It is mutually exclusive
> with --patience, and is simply ignored if you use patience diff.
> we perhaps have "diff.algorithm" which can be one of "myers", "minimal"
> (which is really myers + the minimal flag), and "patience".

Don't forget "histogram".  I have no idea why it's not documented
(evidently 8c912eea slipped through the review cracks) but --histogram
is supported since 1.7.7.

--

-- 
Thomas Rast
trast <at> {inf,student}.ethz.ch
(Continue reading)

Thomas Rast | 6 Mar 14:15 2012
Picon

[PATCH 1/2] perf: compare diff algorithms

8c912ee (teach --histogram to diff, 2011-07-12) claimed histogram diff
was faster than both Myers and patience.

We have since incorporated a performance testing framework, so add a
test that compares the various diff tasks performed in a real 'log -p'
workload.  This does indeed show that histogram diff slightly beats
Myers, while patience is much slower than the others.

Signed-off-by: Thomas Rast <trast <at> student.ethz.ch>
---

The 3000 is pretty arbitrary but makes for a nice test duration.

I'm reluctant to put numbers into the message, since the whole point
of the perf test framework is that you can easily get them too.  But
here's what I'm seeing:

  4000.1: log -3000 (baseline)          0.04(0.02+0.01)                                                     
  4000.2: log --raw -3000 (tree-only)   0.49(0.38+0.09)                                                     
  4000.3: log -p -3000 (Myers)          1.93(1.75+0.17)
  4000.4: log -p -3000 --histogram      1.90(1.74+0.15)
  4000.5: log -p -3000 --patience       2.25(2.07+0.16)

 t/perf/p4000-diff-algorithms.sh |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100755 t/perf/p4000-diff-algorithms.sh

diff --git a/t/perf/p4000-diff-algorithms.sh b/t/perf/p4000-diff-algorithms.sh
new file mode 100755
index 0000000..d6e505c
(Continue reading)

Thomas Rast | 6 Mar 14:15 2012
Picon

[PATCH 2/2] Document the --histogram diff option

Signed-off-by: Thomas Rast <trast <at> student.ethz.ch>
---

This is only the minimal update.  I think in the long run, we should
add a note saying why we support all of them.  BUt off hand I didn't
have any substantial evidence in favour of patience that could be used
as an argument.

 Documentation/diff-options.txt |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 87f0a5f..7d4566f 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
 <at>  <at>  -52,6 +52,9  <at>  <at>  endif::git-format-patch[]
 --patience::
 	Generate a diff using the "patience diff" algorithm.

+--histogram::
+	Generate a diff using the "histogram diff" algorithm.
+
 --stat[=<width>[,<name-width>[,<count>]]]::
 	Generate a diffstat. By default, as much space as necessary
 	will be used for the filename part, and the rest for the graph
--

-- 
1.7.9.2.467.g7fee4

Junio C Hamano | 6 Mar 20:57 2012
Picon
Picon

Re: [PATCH 2/2] Document the --histogram diff option

Thomas Rast <trast <at> student.ethz.ch> writes:

> Signed-off-by: Thomas Rast <trast <at> student.ethz.ch>
> ---
>
> This is only the minimal update.  I think in the long run, we should
> add a note saying why we support all of them.  But off hand I didn't
> have any substantial evidence in favour of patience that could be used
> as an argument.

Isn't the main argument made by proponents of patience diff is more
readable output, and not performance?  That line of argument relies
on a fairly subjective test "which one is easier to read?", so it is
hard to come up with a substantial evidence, unless somebody invests
in A/B test.

Thanks, will queue for 1.7.7.x maintenance track and upwards.
Thomas Rast | 6 Mar 21:42 2012
Picon

Re: [PATCH 2/2] Document the --histogram diff option

Junio C Hamano <gitster <at> pobox.com> writes:

> Thomas Rast <trast <at> student.ethz.ch> writes:
>
>> Signed-off-by: Thomas Rast <trast <at> student.ethz.ch>
>> ---
>>
>> This is only the minimal update.  I think in the long run, we should
>> add a note saying why we support all of them.  But off hand I didn't
>> have any substantial evidence in favour of patience that could be used
>> as an argument.
>
> Isn't the main argument made by proponents of patience diff is more
> readable output, and not performance?  That line of argument relies
> on a fairly subjective test "which one is easier to read?", so it is
> hard to come up with a substantial evidence, unless somebody invests
> in A/B test.

Well, I was just too lazy to look up what I dimly remembered people had
posted at some point: examples where patience beats Myers for
readability.  E.g.,

  http://article.gmane.org/gmane.comp.version-control.git/104316

I don't think you need a blind test to justify that the patience result
is more readable.  So I think in the long run, the docs should say
something like:

--diff-algorithm={histogram|myers|minimal|patience}::
       Choose a diff algorithm.  The variants are as follows:
(Continue reading)

Junio C Hamano | 6 Mar 20:52 2012
Picon
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Thomas Rast <trast <at> student.ethz.ch> writes:

> 8c912ee (teach --histogram to diff, 2011-07-12) claimed histogram diff
> was faster than both Myers and patience.
>
> We have since incorporated a performance testing framework, so add a
> test that compares the various diff tasks performed in a real 'log -p'
> workload.  This does indeed show that histogram diff slightly beats
> Myers, while patience is much slower than the others.
>
> Signed-off-by: Thomas Rast <trast <at> student.ethz.ch>
> ---

Fun.

I am getting this (probably unrelated to this patch), by the way:

$ make perf
make -C t/perf/ all
make[1]: Entering directory `/srv/project/git/git.git/t/perf'
rm -rf test-results
./run
...
perf 4 - grep --cached, expensive regex: 1 2 3 ok
# passed all 4 test(s)
1..4
Can't locate Git.pm in  <at> INC ( <at> INC contains: /etc/perl ...) at ./aggregate.perl line 5.
BEGIN failed--compilation aborted at ./aggregate.perl line 5.
Thomas Rast | 6 Mar 22:00 2012
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Junio C Hamano <gitster <at> pobox.com> writes:

>
> I am getting this (probably unrelated to this patch), by the way:
>
> $ make perf
> make -C t/perf/ all
> make[1]: Entering directory `/srv/project/git/git.git/t/perf'
> rm -rf test-results
> ./run
> ...
> perf 4 - grep --cached, expensive regex: 1 2 3 ok
> # passed all 4 test(s)
> 1..4
> Can't locate Git.pm in  <at> INC ( <at> INC contains: /etc/perl ...) at ./aggregate.perl line 5.
> BEGIN failed--compilation aborted at ./aggregate.perl line 5.

It would seem that you are not installing Git.pm as part of your normal
installation?  aggregate.perl uses it, so you'd have to.  Perhaps a Perl
guru can tell us if it's possible to magically pull Git.pm from the
build tree instead of the installed version...

--

-- 
Thomas Rast
trast <at> {inf,student}.ethz.ch
Junio C Hamano | 6 Mar 22:18 2012
Picon
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Thomas Rast <trast <at> inf.ethz.ch> writes:

> Junio C Hamano <gitster <at> pobox.com> writes:
>
>>
>> I am getting this (probably unrelated to this patch), by the way:
>>
>> $ make perf
>> make -C t/perf/ all
>> make[1]: Entering directory `/srv/project/git/git.git/t/perf'
>> rm -rf test-results
>> ./run
>> ...
>> perf 4 - grep --cached, expensive regex: 1 2 3 ok
>> # passed all 4 test(s)
>> 1..4
>> Can't locate Git.pm in  <at> INC ( <at> INC contains: /etc/perl ...) at ./aggregate.perl line 5.
>> BEGIN failed--compilation aborted at ./aggregate.perl line 5.
>
> It would seem that you are not installing Git.pm as part of your normal
> installation?

I actually am installing it in a quite vanilla way.

I think our installation procedure places Git.pm in git specific
perl library path where a simple invocation of "perl" that is
git-unaware will not look into, and we make sure that our scripts
still find the matching version of Git.pm by having "use lib" at the
beginning that points at the right directory.

(Continue reading)

Jakub Narebski | 6 Mar 22:41 2012
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Junio C Hamano <gitster <at> pobox.com> writes:
> Thomas Rast <trast <at> inf.ethz.ch> writes:
>> Junio C Hamano <gitster <at> pobox.com> writes:
>>
>>>
>>> I am getting this (probably unrelated to this patch), by the way:
>>>
>>> $ make perf
>>> make -C t/perf/ all
>>> make[1]: Entering directory `/srv/project/git/git.git/t/perf'
>>> rm -rf test-results
>>> ./run
>>> ...
>>> perf 4 - grep --cached, expensive regex: 1 2 3 ok
>>> # passed all 4 test(s)
>>> 1..4
>>> Can't locate Git.pm in  <at> INC ( <at> INC contains: /etc/perl ...) at ./aggregate.perl line 5.
>>> BEGIN failed--compilation aborted at ./aggregate.perl line 5.
>>
>> It would seem that you are not installing Git.pm as part of your normal
>> installation?
> 
> I actually am installing it in a quite vanilla way.
> 
> I think our installation procedure places Git.pm in git specific
> perl library path where a simple invocation of "perl" that is
> git-unaware will not look into, and we make sure that our scripts
> still find the matching version of Git.pm by having "use lib" at the
> beginning that points at the right directory.
> 
(Continue reading)

Thomas Rast | 7 Mar 13:44 2012
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Jakub Narebski <jnareb <at> gmail.com> writes:

> Junio C Hamano <gitster <at> pobox.com> writes:
>> 
>> But of course, this from a command line would not work:
>> 
>> 	$ perl -MGit
>> 
>> I do not expect it to, and for the ease of testing new versions, I
>> prefer it not to work.
>> 
>> In any case, you should be able to do anything under t/ _before_
>> installing, so relying on having Git.pm in normal  <at> INC is a double
>> no-no.
>
> Thomas, take a look at how it is solved in 't/t9700/test.pl', used by
> 't/t9700-perl-git.sh':
>
>   use lib (split(/:/, $ENV{GITPERLLIB}));

Hum.  The problem is that the user may invoke aggregate.perl manually,
and GITPERLLIB won't be set in that case.

Is there a better solution than duplicating the logic that sets
GITPERLLIB in test-lib.sh within aggregate.perl?

--

-- 
Thomas Rast
trast <at> {inf,student}.ethz.ch
(Continue reading)

Junio C Hamano | 7 Mar 18:45 2012
Picon
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Thomas Rast <trast <at> inf.ethz.ch> writes:

> Jakub Narebski <jnareb <at> gmail.com> writes:
>
>> Thomas, take a look at how it is solved in 't/t9700/test.pl', used by
>> 't/t9700-perl-git.sh':
>>
>>   use lib (split(/:/, $ENV{GITPERLLIB}));
>
> Hum.  The problem is that the user may invoke aggregate.perl manually,
> and GITPERLLIB won't be set in that case.

I would equate "manual invocation" with "the user knows what he is doing",
so if that is the only problem, I think we are good.

> Is there a better solution than duplicating the logic that sets
> GITPERLLIB in test-lib.sh within aggregate.perl?

Perhaps the part to figure out the directory layout can and should
be split out of test-lib.sh into test-env.sh or something and be
dot-sourced at the beginning of test-lib.sh; would that help?

 t/test-env.sh |  112 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 t/test-lib.sh |  118 ++++-----------------------------------------------------
 2 files changed, 120 insertions(+), 110 deletions(-)

diff --git a/t/test-env.sh b/t/test-env.sh
new file mode 100644
index 0000000..1159c5a
--- /dev/null
(Continue reading)

Jakub Narebski | 7 Mar 19:03 2012
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Thomas Rast wrote:
> Jakub Narebski <jnareb <at> gmail.com> writes:
> > Junio C Hamano <gitster <at> pobox.com> writes:
> >> 
> >> But of course, this from a command line would not work:
> >> 
> >> 	$ perl -MGit
> >> 
> >> I do not expect it to, and for the ease of testing new versions, I
> >> prefer it not to work.
> >> 
> >> In any case, you should be able to do anything under t/ _before_
> >> installing, so relying on having Git.pm in normal  <at> INC is a double
> >> no-no.
> >
> > Thomas, take a look at how it is solved in 't/t9700/test.pl', used by
> > 't/t9700-perl-git.sh':
> >
> >   use lib (split(/:/, $ENV{GITPERLLIB}));
> 
> Hum.  The problem is that the user may invoke aggregate.perl manually,
> and GITPERLLIB won't be set in that case.
> 
> Is there a better solution than duplicating the logic that sets
> GITPERLLIB in test-lib.sh within aggregate.perl?

Beside extracting logic that sets GITPERLLIB into separate file like
in Junio proposal?  You can always assume that it is fixed relative
to perl/Git.pm, and use __DIR__ or $FindBin to make "use lib", e.g.

(Continue reading)

Junio C Hamano | 7 Mar 19:19 2012
Picon
Picon

Re: [PATCH 1/2] perf: compare diff algorithms

Jakub Narebski <jnareb <at> gmail.com> writes:

> Beside extracting logic that sets GITPERLLIB into separate file like
> in Junio proposal?

That was not even a proposal.  The part that deals with valgrind is
blatantly wrong (it shouldn't create symlinks in that code, it only
should figure out where things should be).

> You can always assume that it is fixed relative
> to perl/Git.pm, and use __DIR__ or $FindBin to make "use lib", e.g.
>
>   use FindBin;
>   use lib "$FindBin::Bin/../../perl";

Use of FindBin to find the location of the script is OK, but does
using "../../perl" really work?

We have this

  GITPERLLIB="$GIT_BUILD_DIR"/perl/blib/lib:"$GIT_BUILD_DIR"/perl/blib/arch/auto/Git

to look for two places in test-lib.sh
René Scharfe | 10 Mar 08:13 2012

Re: [PATCH 1/2] perf: compare diff algorithms

Am 06.03.2012 14:15, schrieb Thomas Rast:
> 8c912ee (teach --histogram to diff, 2011-07-12) claimed histogram diff
> was faster than both Myers and patience.
>
> We have since incorporated a performance testing framework, so add a
> test that compares the various diff tasks performed in a real 'log -p'
> workload.  This does indeed show that histogram diff slightly beats
> Myers, while patience is much slower than the others.
>
> Signed-off-by: Thomas Rast<trast <at> student.ethz.ch>
> ---
>
> The 3000 is pretty arbitrary but makes for a nice test duration.
>
> I'm reluctant to put numbers into the message, since the whole point
> of the perf test framework is that you can easily get them too.  But
> here's what I'm seeing:
>
>    4000.1: log -3000 (baseline)          0.04(0.02+0.01)
>    4000.2: log --raw -3000 (tree-only)   0.49(0.38+0.09)
>    4000.3: log -p -3000 (Myers)          1.93(1.75+0.17)
>    4000.4: log -p -3000 --histogram      1.90(1.74+0.15)
>    4000.5: log -p -3000 --patience       2.25(2.07+0.16)

Just a data point: --histogram is slightly slower for me:

    Test                                  this tree
    -----------------------------------------------------
    4000.1: log -3000 (baseline)          0.07(0.07+0.00)
    4000.2: log --raw -3000 (tree-only)   0.35(0.31+0.04)
(Continue reading)

Jeff King | 6 Mar 14:30 2012
Picon

Re: [PATCH] config: Introduce --patience config variable

On Tue, Mar 06, 2012 at 02:01:42PM +0100, Thomas Rast wrote:

> >> --- a/Documentation/diff-config.txt
> >> +++ b/Documentation/diff-config.txt
> >>  <at>  <at>  -86,6 +86,9  <at>  <at>  diff.mnemonicprefix::
> >>  diff.noprefix::
> >>  	If set, 'git diff' does not show any source or destination prefix.
> >>  
> >> +diff.patience:
> >> +    If set, 'git diff' will use patience algorithm.
> >> +
> >
> > Should this be a boolean? Or should we actually have a diff.algorithm
> > option where you specify the algorithm you want (e.g., "diff.algorithm =
> > patience")? That would free us up later to more easily add new values.
> >
> > In particular, I am thinking about --minimal. It is mutually exclusive
> > with --patience, and is simply ignored if you use patience diff.
> > we perhaps have "diff.algorithm" which can be one of "myers", "minimal"
> > (which is really myers + the minimal flag), and "patience".
> 
> Don't forget "histogram".  I have no idea why it's not documented
> (evidently 8c912eea slipped through the review cracks) but --histogram
> is supported since 1.7.7.

Ah, thanks. I had the vague feeling that we had a third algorithm
already, but I didn't see it in the docs. So yeah, I really think this
should be diff.algorithm, with a value of "myers", "patience", or
"histogram" (and possibly "minimal", depending how we want to treat
that).
(Continue reading)

Michal Privoznik | 6 Mar 14:32 2012
Picon

Re: [PATCH] config: Introduce --patience config variable

On 06.03.2012 14:01, Thomas Rast wrote:
> Jeff King <peff <at> peff.net> writes:
> 
>> On Tue, Mar 06, 2012 at 11:59:42AM +0100, Michal Privoznik wrote:
>>
>>> --- a/Documentation/diff-config.txt
>>> +++ b/Documentation/diff-config.txt
>>>  <at>  <at>  -86,6 +86,9  <at>  <at>  diff.mnemonicprefix::
>>>  diff.noprefix::
>>>  	If set, 'git diff' does not show any source or destination prefix.
>>>  
>>> +diff.patience:
>>> +    If set, 'git diff' will use patience algorithm.
>>> +
>>
>> Should this be a boolean? Or should we actually have a diff.algorithm
>> option where you specify the algorithm you want (e.g., "diff.algorithm =
>> patience")? That would free us up later to more easily add new values.
>>
>> In particular, I am thinking about --minimal. It is mutually exclusive
>> with --patience, and is simply ignored if you use patience diff.
>> we perhaps have "diff.algorithm" which can be one of "myers", "minimal"
>> (which is really myers + the minimal flag), and "patience".
> 
> Don't forget "histogram".  I have no idea why it's not documented
> (evidently 8c912eea slipped through the review cracks) but --histogram
> is supported since 1.7.7.
> 

Okay guys. I'll got with diff.algorithm = [patience | minimal |
(Continue reading)

Matthieu Moy | 6 Mar 14:38 2012
Picon

Re: [PATCH] config: Introduce --patience config variable

Michal Privoznik <mprivozn <at> redhat.com> writes:

> Okay guys. I'll got with diff.algorithm = [patience | minimal |
> histogram | myers] then. What I am not sure about is how to threat case
> when user have say algorithm = patience set in config but want to use
> myers. I guess we need --myers option then, don't we?

At this point, maybe it's time to have
--diff-algorithm=[patience|minimal|histogram|myers], and keep
--patience, --minimal and --histogram just as compatibility aliases.

Having one option per algorithm feels wrong ...

--

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
Jeff King | 6 Mar 15:09 2012
Picon

Re: [PATCH] config: Introduce --patience config variable

On Tue, Mar 06, 2012 at 02:38:16PM +0100, Matthieu Moy wrote:

> Michal Privoznik <mprivozn <at> redhat.com> writes:
> 
> > Okay guys. I'll got with diff.algorithm = [patience | minimal |
> > histogram | myers] then. What I am not sure about is how to threat case
> > when user have say algorithm = patience set in config but want to use
> > myers. I guess we need --myers option then, don't we?
> 
> At this point, maybe it's time to have
> --diff-algorithm=[patience|minimal|histogram|myers], and keep
> --patience, --minimal and --histogram just as compatibility aliases.
> 
> Having one option per algorithm feels wrong ...

Yeah, that was going to be my suggestion, too. And explaining it that
way in the docs will make it clear that "--histogram" overrides
"--patience" and so forth.

-Peff
Junio C Hamano | 7 Mar 03:57 2012
Picon
Picon

Re: [PATCH] config: Introduce --patience config variable

Jeff King <peff <at> peff.net> writes:

> On Tue, Mar 06, 2012 at 11:59:42AM +0100, Michal Privoznik wrote:
>
>> Some users like the patience diff more than the bare diff. However,
>> specifying the '--patience' argument every time diff is to be used
>> is impractical. Moreover, creating an alias doesn't play with other
>> tools nice, e.g. git-show; Therefore we need a configuration variable
>> to turn this on/off across whole git tools.
>
> The idea of turning on patience diff via config makes sense to me, and
> it is not a problem for plumbing tools to have patience diff on, since
> patience diffs are syntactically identical to non-patience diffs.

Even though I do not strongly object to the general conclusion, I
have to point out that the last line above is a dangerous thought.

If you change the default diff algorithm, you will invalidate long
term caches that computed their keys based on the shape of patches
produced in the past.  The prime example being the rerere database,
but I wouldn't be surprised if somebody has a notes tree to record
patch ids for existing commits to speed up "git cherry" (hence "git
rebase").  Also kup tool kernel.org folks use to optimize the
uploading of inter-release diff relies on having a stable output
from "git diff A..B", so that the uploader can run the command to
produce a diff locally, GPG sign it, and upload only the signature
and have the k.org side produce the same diff between two tags,
without having to upload the huge patchfile over the wire.

IOW, "syntactically identical so it is OK" is not the right thought
(Continue reading)

Jeff King | 7 Mar 12:47 2012
Picon

Re: [PATCH] config: Introduce --patience config variable

On Tue, Mar 06, 2012 at 06:57:42PM -0800, Junio C Hamano wrote:

> > The idea of turning on patience diff via config makes sense to me, and
> > it is not a problem for plumbing tools to have patience diff on, since
> > patience diffs are syntactically identical to non-patience diffs.
> 
> Even though I do not strongly object to the general conclusion, I
> have to point out that the last line above is a dangerous thought.
> 
> If you change the default diff algorithm, you will invalidate long
> term caches that computed their keys based on the shape of patches
> produced in the past.

I see your point, though I don't think I'd use the word "dangerous" to
describe the invalidation of a cache. In a true cache, we must be ready
for cache misses, so the "danger" is causing extra cache misses. That
might be non-optimal, depending on the cost of a miss, but it's not
going to result in data loss.

[This email ended up long; the "tl;dr" is: "I don't think it's that big a
 deal, but I don't really have a problem limiting this feature to
 just porcelain". Michal, the way to do that would be to move the config
 parsing into git_diff_ui_config instead of git_diff_basic_config.]

> The prime example being the rerere database,

Having said that about caches, I don't know that I'd call rerere a
cache. The "miss" means the user has to resolve the merge. So whether it
is a cache or a database of precious information depends on how much you
value the user's input.
(Continue reading)

Junio C Hamano | 7 Mar 18:24 2012
Picon
Picon

Re: [PATCH] config: Introduce --patience config variable

Jeff King <peff <at> peff.net> writes:

> On Tue, Mar 06, 2012 at 06:57:42PM -0800, Junio C Hamano wrote:
>
>> > The idea of turning on patience diff via config makes sense to me, and
>> > it is not a problem for plumbing tools to have patience diff on, since
>> > patience diffs are syntactically identical to non-patience diffs.
>> 
>> Even though I do not strongly object to the general conclusion, I
>> have to point out that the last line above is a dangerous thought.
>>
>> If you change the default diff algorithm, you will invalidate long
>> term caches that computed their keys based on the shape of patches
>> produced in the past.
>
> I see your point, though I don't think I'd use the word "dangerous" to
> describe the invalidation of a cache.

I probably was not writing clearly enough to avoid getting misread.
The "dangerous" does not refer to "invalidation of a cache".  What I
meant was that "The output is syntactically identical, so it is OK"
is a dangerous way to think when assessing the risk of regression,
because applying to a given preimage and producing the same
postimage is not the *only* way the output is used.

I think the executive summary is that we are in agreement; your
analysis of potential regression coming from differences of the
shape of the patch output (not applicability to a given preimage to
produce the same postimage) seems to match what I said in the
previous message.
(Continue reading)


Gmane