diff --git a/NEWS b/NEWS index aff7a9d..58a7cbb 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,13 @@ GNU diffutils NEWS -*- outline -*- consider two Asian file names to be the same merely because they contain no English characters. +** Performance changes + + diff's default algorithm has been adjusted to output higher-quality + results at somewhat greater computational cost, as CPUs have gotten + faster since the algorithm was last tweaked in diffutils-2.6 (1993). + + * Noteworthy changes in release 3.3 (2013-03-24) [stable] ** New features diff --git a/doc/diffutils.texi b/doc/diffutils.texi index 1d48f54..7b08cd4 100644 --- a/doc/diffutils.texi +++ b/doc/diffutils.texi @@ -142,26 +142,26 @@ use diffs to update files. David Hayes, Richard Stallman, and Len Tower. Wayne Davison designed and implemented the unified output format. The basic algorithm is described by Eugene W. Myers in ``An O(ND) Difference Algorithm and its Variations'', -@cite{Algorithmica} Vol.@: 1 No.@: 2, 1986, pp.@: 251--266; and in ``A File +@cite{Algorithmica} Vol.@: 1, 1986, pp.@: 251--266, +@url{http://dx.doi.org/10.1007/BF01840446}; and in ``A File Comparison Program'', Webb Miller and Eugene W. Myers, -@cite{Software---Practice and Experience} Vol.@: 15 No.@: 11, 1985, -pp.@: 1025--1040. +@cite{Software---Practice and Experience} Vol.@: 15, 1985, +pp.@: 1025--1040, +@url{http://dx.doi.org/10.1002/spe.4380151102}. @c From: "Gene Myers" @c They are about the same basic algorithm; the Algorithmica @c paper gives a rigorous treatment and the sub-algorithm for @c delivering scripts and should be the primary reference, but @c both should be mentioned. -The algorithm was independently discovered as described by E. Ukkonen in +The algorithm was independently discovered as described by Esko Ukkonen in ``Algorithms for Approximate String Matching'', -@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118. +@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118, +@url{http://dx.doi.org/10.1016/S0019-9958(85)80046-2}. @c From: "Gene Myers" @c Date: Wed, 29 Sep 1993 08:27:55 MST @c Ukkonen should be given credit for also discovering the algorithm used @c in GNU diff. -Unless the @option{--minimal} option is used, @command{diff} uses a -heuristic by Paul Eggert that limits the cost to @math{O(N^1.5 log N)} -at the price of producing suboptimal output for large inputs with many -differences. Related algorithms are surveyed by Alfred V. Aho in +Related algorithms are surveyed by Alfred V. Aho in section 6.3 of ``Algorithms for Finding Patterns in Strings'', @cite{Handbook of Theoretical Computer Science} (Jan Van Leeuwen, ed.), Vol.@: A, @cite{Algorithms and Complexity}, Elsevier/MIT Press, diff --git a/src/analyze.c b/src/analyze.c index 1886e7e..f062fd3 100644 --- a/src/analyze.c +++ b/src/analyze.c @@ -532,7 +532,6 @@ diff_2_files (struct comparison *cmp) { struct context ctxt; lin diags; - lin too_expensive; /* Allocate vectors for the results of comparison: a flag for each line of each file, saying whether that line @@ -564,18 +563,11 @@ diff_2_files (struct comparison *cmp) ctxt.heuristic = speed_large_files; - /* Set TOO_EXPENSIVE to be approximate square root of input size, - bounded below by 256. */ - too_expensive = 1; - for (; diags != 0; diags >>= 2) - too_expensive <<= 1; - ctxt.too_expensive = MAX (256, too_expensive); - files[0] = cmp->file[0]; files[1] = cmp->file[1]; compareseq (0, cmp->file[0].nondiscarded_lines, - 0, cmp->file[1].nondiscarded_lines, minimal, &ctxt); + 0, cmp->file[1].nondiscarded_lines, &ctxt); free (ctxt.fdiag - (cmp->file[1].nondiscarded_lines + 1));