mirror of
https://https.git.savannah.gnu.org/git/diffutils.git
synced 2026-01-27 01:44:20 +00:00
diff: remove TOO_EXPENSIVE heuristic
Problem reported by Vincent Lefevre in <http://bugs.gnu.org/16848>. The simplest solution is to remove the TOO_EXPENSIVE heuristic that I added to GNU diff in 1993. Although appropriate for circa-1993 hardware, these days the heuristic seems to be more trouble than it's worth. * NEWS: Document this. * doc/diffutils.texi (Overview): Modernize citations. Remove mention of TOO_EXPENSIVE heuristic. * src/analyze.c (diff_2_files): Adjust to TOO_EXPENSIVE-related API changes in gnulib's diffseq module.
This commit is contained in:
parent
bc51e4bcb4
commit
9b48bf3d3e
7
NEWS
7
NEWS
@ -13,6 +13,13 @@ GNU diffutils NEWS -*- outline -*-
|
||||
consider two Asian file names to be the same merely because they
|
||||
contain no English characters.
|
||||
|
||||
** Performance changes
|
||||
|
||||
diff's default algorithm has been adjusted to output higher-quality
|
||||
results at somewhat greater computational cost, as CPUs have gotten
|
||||
faster since the algorithm was last tweaked in diffutils-2.6 (1993).
|
||||
|
||||
|
||||
* Noteworthy changes in release 3.3 (2013-03-24) [stable]
|
||||
|
||||
** New features
|
||||
|
||||
@ -142,26 +142,26 @@ use diffs to update files.
|
||||
David Hayes, Richard Stallman, and Len Tower. Wayne Davison designed and
|
||||
implemented the unified output format. The basic algorithm is described
|
||||
by Eugene W. Myers in ``An O(ND) Difference Algorithm and its Variations'',
|
||||
@cite{Algorithmica} Vol.@: 1 No.@: 2, 1986, pp.@: 251--266; and in ``A File
|
||||
@cite{Algorithmica} Vol.@: 1, 1986, pp.@: 251--266,
|
||||
@url{http://dx.doi.org/10.1007/BF01840446}; and in ``A File
|
||||
Comparison Program'', Webb Miller and Eugene W. Myers,
|
||||
@cite{Software---Practice and Experience} Vol.@: 15 No.@: 11, 1985,
|
||||
pp.@: 1025--1040.
|
||||
@cite{Software---Practice and Experience} Vol.@: 15, 1985,
|
||||
pp.@: 1025--1040,
|
||||
@url{http://dx.doi.org/10.1002/spe.4380151102}.
|
||||
@c From: "Gene Myers" <gene@cs.arizona.edu>
|
||||
@c They are about the same basic algorithm; the Algorithmica
|
||||
@c paper gives a rigorous treatment and the sub-algorithm for
|
||||
@c delivering scripts and should be the primary reference, but
|
||||
@c both should be mentioned.
|
||||
The algorithm was independently discovered as described by E. Ukkonen in
|
||||
The algorithm was independently discovered as described by Esko Ukkonen in
|
||||
``Algorithms for Approximate String Matching'',
|
||||
@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118.
|
||||
@cite{Information and Control} Vol.@: 64, 1985, pp.@: 100--118,
|
||||
@url{http://dx.doi.org/10.1016/S0019-9958(85)80046-2}.
|
||||
@c From: "Gene Myers" <gene@cs.arizona.edu>
|
||||
@c Date: Wed, 29 Sep 1993 08:27:55 MST
|
||||
@c Ukkonen should be given credit for also discovering the algorithm used
|
||||
@c in GNU diff.
|
||||
Unless the @option{--minimal} option is used, @command{diff} uses a
|
||||
heuristic by Paul Eggert that limits the cost to @math{O(N^1.5 log N)}
|
||||
at the price of producing suboptimal output for large inputs with many
|
||||
differences. Related algorithms are surveyed by Alfred V. Aho in
|
||||
Related algorithms are surveyed by Alfred V. Aho in
|
||||
section 6.3 of ``Algorithms for Finding Patterns in Strings'',
|
||||
@cite{Handbook of Theoretical Computer Science} (Jan Van Leeuwen,
|
||||
ed.), Vol.@: A, @cite{Algorithms and Complexity}, Elsevier/MIT Press,
|
||||
|
||||
@ -532,7 +532,6 @@ diff_2_files (struct comparison *cmp)
|
||||
{
|
||||
struct context ctxt;
|
||||
lin diags;
|
||||
lin too_expensive;
|
||||
|
||||
/* Allocate vectors for the results of comparison:
|
||||
a flag for each line of each file, saying whether that line
|
||||
@ -564,18 +563,11 @@ diff_2_files (struct comparison *cmp)
|
||||
|
||||
ctxt.heuristic = speed_large_files;
|
||||
|
||||
/* Set TOO_EXPENSIVE to be approximate square root of input size,
|
||||
bounded below by 256. */
|
||||
too_expensive = 1;
|
||||
for (; diags != 0; diags >>= 2)
|
||||
too_expensive <<= 1;
|
||||
ctxt.too_expensive = MAX (256, too_expensive);
|
||||
|
||||
files[0] = cmp->file[0];
|
||||
files[1] = cmp->file[1];
|
||||
|
||||
compareseq (0, cmp->file[0].nondiscarded_lines,
|
||||
0, cmp->file[1].nondiscarded_lines, minimal, &ctxt);
|
||||
0, cmp->file[1].nondiscarded_lines, &ctxt);
|
||||
|
||||
free (ctxt.fdiag - (cmp->file[1].nondiscarded_lines + 1));
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user