[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-diffutils] bug#44838: diff 3.7 incorrectly reports added lines and
From: |
Vincent Lefevre |
Subject: |
[bug-diffutils] bug#44838: diff 3.7 incorrectly reports added lines and can generate huge diffs |
Date: |
Tue, 24 Nov 2020 12:33:52 +0100 |
User-agent: |
Mutt/1.14.5+76 (bb407ec3) vl-127292 (2020-06-24) |
I've attached an archive with 2 files "file1" and "file2"; "file2"
is "file1" with some lines removed, so that a diff should report
only removed lines.
Here are some tests done under Debian/sid (x86_64) with diff 3.7
(Debian package diffutils 1:3.7-3).
First, for the reference, the size of the initial diff:
$ diff -u file1 file2 | wc -l
22319
But this diff reports added lines, though "file2" has only removed
lines compared to "file1".
──────────────────────────────────────────────────────────────────
$ diff -u file1 file2 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent@vinc17.net> 1404215412 +0000
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
-
-blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent@vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
blob
mark :37951
@@ -9910,21 +467,6 @@
M 100644 :38018 src/round_raw_generic.c
blob
──────────────────────────────────────────────────────────────────
In particular, one can see:
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
and
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
while these lines should have been regarded as unmodified.
This problem disappears if I shorten "file2" a bit (these lines are
at the very beginning in "file2", so that such a change of behavior
is surprising):
$ head -n 129410 file2 > file3
$ diff -u file1 file3 | grep '^\+'
+++ file3 2020-11-24 11:58:17.922462693 +0100
So, now, no added lines reported. This is fine.
And here's what diff now gives around these lines:
──────────────────────────────────────────────────────────────────
$ diff -u file1 file3 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent@vinc17.net> 1404215412 +0000
data 55
[tests/trandom_deviate.c] Correction (fprintf format).
from :37946
M 100644 :37947 tests/trandom_deviate.c
blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent@vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
-
-blob
mark :37951
data 15
Blob at :37951
@@ -9910,21 +467,6 @@
M 100644 :38018 src/round_raw_generic.c
blob
-mark :38020
-data 15
──────────────────────────────────────────────────────────────────
This is now OK, but stranger things happen when I reduce "file2"
even more:
$ head -n 120200 file2 > file4
$ diff -u file1 file4 | grep -c '^\+'
7
$ diff -u file1 file4 | wc -l
31251
So, with "file2" reduced to 120200 lines, 7 − 1 = 6 added lines
are reported (though this new file has only removed lines). This
is incorrect, but if I remove 100 more lines at the end, this is
much worse, with 81120 added lines reported, and a huge diff:
$ head -n 120100 file2 > file5
$ diff -u file1 file5 | grep -c '^\+'
81121
$ diff -u file1 file5 | wc -l
231111
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
files.tar.xz
Description: Binary data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug-diffutils] bug#44838: diff 3.7 incorrectly reports added lines and can generate huge diffs,
Vincent Lefevre <=