diff: remove TOO_EXPENSIVE heuristic

Problem reported by Vincent Lefevre in <http://bugs.gnu.org/16848>. The simplest solution is to remove the TOO_EXPENSIVE heuristic that I added to GNU diff in 1993. Although appropriate for circa-1993 hardware, these days the heuristic seems to be more trouble than it's worth. * NEWS: Document this. * doc/diffutils.texi (Overview): Modernize citations. Remove mention of TOO_EXPENSIVE heuristic. * src/analyze.c (diff_2_files): Adjust to TOO_EXPENSIVE-related API changes in gnulib's diffseq module.
2026-01-27 01:44:20 +00:00 · 2014-02-23 22:49:27 -08:00 · 2014-02-23 22:49:27 -08:00 · 9b48bf3d3e
commit 9b48bf3d3e
parent bc51e4bcb4
3 changed files with 17 additions and 18 deletions
--- a/7
+++ b/7
@ -13,6 +13,13 @@ GNU diffutils NEWS                                    -*- outline -*-
  consider two Asian file names to be the same merely because they
  contain no English characters.

+** Performance changes
+
+  diff's default algorithm has been adjusted to output higher-quality
+  results at somewhat greater computational cost, as CPUs have gotten
+  faster since the algorithm was last tweaked in diffutils-2.6 (1993).
+
+
 * Noteworthy changes in release 3.3 (2013-03-24) [stable]

 ** New features
--- a/doc/diffutils.texi
+++ b/doc/diffutils.texi
@ -142,26 +142,26 @@ use diffs to update files.
 David Hayes, Richard Stallman, and Len Tower.  Wayne Davison designed and
 implemented the unified output format.  The basic algorithm is described
 by Eugene W. Myers in ``An O(ND) Difference Algorithm and its Variations'',
-@cite{Algorithmica} Vol.@: 1 No.@: 2, 1986, pp.@: 251--266; and in ``A File
+@cite{Algorithmica} Vol.@: 1, 1986, pp.@: 251--266,
+@url{http://dx.doi.org/10.1007/BF01840446}; and in ``A File
 Comparison Program'', Webb Miller and Eugene W. Myers,
-@cite{Software---Practice and Experience} Vol.@: 15 No.@: 11, 1985,
-pp.@: 1025--1040.
+@cite{Software---Practice and Experience} Vol.@: 15, 1985,
+pp.@: 1025--1040,
+@url{http://dx.doi.org/10.1002/spe.4380151102}.
@c From: "Gene Myers" <gene@cs.arizona.edu>
@c They are about the same basic algorithm; the Algorithmica
@c paper gives a rigorous treatment and the sub-algorithm for
@c delivering scripts and should be the primary reference, but
@c both should be mentioned.
-The algorithm was independently discovered as described by E. Ukkonen in
+The algorithm was independently discovered as described by Esko Ukkonen in
 ``Algorithms for Approximate String Matching'',
-@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118.
+@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118,
+@url{http://dx.doi.org/10.1016/S0019-9958(85)80046-2}.
@c From: "Gene Myers" <gene@cs.arizona.edu>
@c Date: Wed, 29 Sep 1993 08:27:55 MST
@c Ukkonen should be given credit for also discovering the algorithm used
@c in GNU diff.
-Unless the @option{--minimal} option is used, @command{diff} uses a
-heuristic by Paul Eggert that limits the cost to @math{O(N^1.5 log N)}
-at the price of producing suboptimal output for large inputs with many
-differences.  Related algorithms are surveyed by Alfred V. Aho in
+Related algorithms are surveyed by Alfred V. Aho in
 section 6.3 of ``Algorithms for Finding Patterns in Strings'',
@cite{Handbook of Theoretical Computer Science} (Jan Van Leeuwen,
 ed.), Vol.@: A, @cite{Algorithms and Complexity}, Elsevier/MIT Press,
--- a/src/analyze.c
+++ b/src/analyze.c
@ -532,7 +532,6 @@ diff_2_files (struct comparison *cmp)
    {
      struct context ctxt;
      lin diags;
-      lin too_expensive;

      /* Allocate vectors for the results of comparison:
 	 a flag for each line of each file, saying whether that line
@ -564,18 +563,11 @@ diff_2_files (struct comparison *cmp)

      ctxt.heuristic = speed_large_files;

-      /* Set TOO_EXPENSIVE to be approximate square root of input size,
-	 bounded below by 256.  */
-      too_expensive = 1;
-      for (;  diags != 0;  diags >>= 2)
-	too_expensive <<= 1;
-      ctxt.too_expensive = MAX (256, too_expensive);
-
      files[0] = cmp->file[0];
      files[1] = cmp->file[1];

      compareseq (0, cmp->file[0].nondiscarded_lines,
-		  0, cmp->file[1].nondiscarded_lines, minimal, &ctxt);
+		  0, cmp->file[1].nondiscarded_lines, &ctxt);

      free (ctxt.fdiag - (cmp->file[1].nondiscarded_lines + 1));