diff: remove TOO_EXPENSIVE heuristic

Problem reported by Vincent Lefevre in <http://bugs.gnu.org/16848>.
The simplest solution is to remove the TOO_EXPENSIVE heuristic
that I added to GNU diff in 1993.  Although appropriate for
circa-1993 hardware, these days the heuristic seems to be more
trouble than it's worth.
* NEWS: Document this.
* doc/diffutils.texi (Overview): Modernize citations.
Remove mention of TOO_EXPENSIVE heuristic.
* src/analyze.c (diff_2_files): Adjust to TOO_EXPENSIVE-related
API changes in gnulib's diffseq module.
This commit is contained in:
Paul Eggert 2014-02-23 22:49:27 -08:00
parent bc51e4bcb4
commit 9b48bf3d3e
3 changed files with 17 additions and 18 deletions

7
NEWS
View File

@ -13,6 +13,13 @@ GNU diffutils NEWS -*- outline -*-
consider two Asian file names to be the same merely because they
contain no English characters.
** Performance changes
diff's default algorithm has been adjusted to output higher-quality
results at somewhat greater computational cost, as CPUs have gotten
faster since the algorithm was last tweaked in diffutils-2.6 (1993).
* Noteworthy changes in release 3.3 (2013-03-24) [stable]
** New features

View File

@ -142,26 +142,26 @@ use diffs to update files.
David Hayes, Richard Stallman, and Len Tower. Wayne Davison designed and
implemented the unified output format. The basic algorithm is described
by Eugene W. Myers in ``An O(ND) Difference Algorithm and its Variations'',
@cite{Algorithmica} Vol.@: 1 No.@: 2, 1986, pp.@: 251--266; and in ``A File
@cite{Algorithmica} Vol.@: 1, 1986, pp.@: 251--266,
@url{http://dx.doi.org/10.1007/BF01840446}; and in ``A File
Comparison Program'', Webb Miller and Eugene W. Myers,
@cite{Software---Practice and Experience} Vol.@: 15 No.@: 11, 1985,
pp.@: 1025--1040.
@cite{Software---Practice and Experience} Vol.@: 15, 1985,
pp.@: 1025--1040,
@url{http://dx.doi.org/10.1002/spe.4380151102}.
@c From: "Gene Myers" <gene@cs.arizona.edu>
@c They are about the same basic algorithm; the Algorithmica
@c paper gives a rigorous treatment and the sub-algorithm for
@c delivering scripts and should be the primary reference, but
@c both should be mentioned.
The algorithm was independently discovered as described by E. Ukkonen in
The algorithm was independently discovered as described by Esko Ukkonen in
``Algorithms for Approximate String Matching'',
@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118.
@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118,
@url{http://dx.doi.org/10.1016/S0019-9958(85)80046-2}.
@c From: "Gene Myers" <gene@cs.arizona.edu>
@c Date: Wed, 29 Sep 1993 08:27:55 MST
@c Ukkonen should be given credit for also discovering the algorithm used
@c in GNU diff.
Unless the @option{--minimal} option is used, @command{diff} uses a
heuristic by Paul Eggert that limits the cost to @math{O(N^1.5 log N)}
at the price of producing suboptimal output for large inputs with many
differences. Related algorithms are surveyed by Alfred V. Aho in
Related algorithms are surveyed by Alfred V. Aho in
section 6.3 of ``Algorithms for Finding Patterns in Strings'',
@cite{Handbook of Theoretical Computer Science} (Jan Van Leeuwen,
ed.), Vol.@: A, @cite{Algorithms and Complexity}, Elsevier/MIT Press,

View File

@ -532,7 +532,6 @@ diff_2_files (struct comparison *cmp)
{
struct context ctxt;
lin diags;
lin too_expensive;
/* Allocate vectors for the results of comparison:
a flag for each line of each file, saying whether that line
@ -564,18 +563,11 @@ diff_2_files (struct comparison *cmp)
ctxt.heuristic = speed_large_files;
/* Set TOO_EXPENSIVE to be approximate square root of input size,
bounded below by 256. */
too_expensive = 1;
for (; diags != 0; diags >>= 2)
too_expensive <<= 1;
ctxt.too_expensive = MAX (256, too_expensive);
files[0] = cmp->file[0];
files[1] = cmp->file[1];
compareseq (0, cmp->file[0].nondiscarded_lines,
0, cmp->file[1].nondiscarded_lines, minimal, &ctxt);
0, cmp->file[1].nondiscarded_lines, &ctxt);
free (ctxt.fdiag - (cmp->file[1].nondiscarded_lines + 1));