Using diff

Overview
While revising for some Kubernetes exams, I realised I havent done a blog post on some useful commands.
The second blog post covers the diff
command.
The diff
command in Linux is a powerful utility used to compare files line
by line. It identifies the differences between two files and outputs these
changes in a format that can be used to patch one file to become identical to
the other. It is a fundamental tool for software development, system
administration, and document management where tracking changes is important.
If you use git alot - the content will be pleasantly familiar to you :-)
Creating a diff patch
To create a diff file (patch), use the diff
command. The basic syntax is:
1diff [options] file1 file2 > patchfile.patch
file1
: The original file.file2
: The modified file.patchfile.patch
: The name of the file where the differences will be stored. This file can be named anything, but.patch
is a common convention.
Example:
1diff original.txt modified.txt > mychanges.patch
This command compares original.txt
and modified.txt
and saves the differences in a file named mychanges.patch
.
Important diff options for creation:
-
-u
or-U NUM
: Creates a unified diff. This is the most common and recommended format.NUM
specifies the number of context lines to include around the changes. Using-u
without a number defaults to 3 lines. Unified diffs are much easier to read and apply.1diff -u original.txt modified.txt > mychanges.patch 2diff -U5 original.txt modified.txt > mychanges.patch # Includes 5 lines of context
-
-N
or--new-file
: Treats absent files as empty. Useful when creating new files.1diff -uN original.txt newfile.txt > mychanges.patch #newfile.txt didn't exist previously
-
-r
: Recursively compare directories.1diff -ur directory1 directory2 > directory_changes.patch
-
--exclude=PATTERN
: Exclude files matching PATTERN when comparing directories recursively. Useful to ignore auto-generated files.1diff -ur --exclude=*.o directory1 directory2 > directory_changes.patch #ignores .o files
-
-i
: Ignore case differences.1diff -ui original.txt modified.txt > mychanges.patch
-
-w
: Ignore whitespace changes.1diff -uw original.txt modified.txt > mychanges.patch
-
-b
: Ignore changes in the amount of whitespace.1diff -ub original.txt modified.txt > mychanges.patch
-
--ignore-all-space
: Ignore all whitespace. Equivalent to-w
.1diff -u --ignore-all-space original.txt modified.txt > mychanges.patch
-
--suppress-common-lines
: Do not print common lines. Makes diffs smaller and focuses only on the differences. Use with caution, as it can make patches harder to understand in isolation.1diff -u --suppress-common-lines original.txt modified.txt > mychanges.patch
Choose the options based on the context of the changes and how the patch will
be used. For general use, -u
(unified diff) is almost always the best
choice.
When dealing with directory trees, -ur
is crucial. When sharing code for
review, include enough context lines (-U
) for the reviewer to understand
the changes, but not so many that the diff becomes bloated.
Diff example walkthrough:
-
Basic File Comparison: Let's say you have two files,
file1.txt
andfile2.txt
.file1.txt
:1This is line 1. 2This is line 2. 3This is line 3. 4This is line 4. 5This is line 5.
file2.txt
:1This is line 1. 2This is a modified line 2. 3This is line 3. 4This is a new line. 5This is line 5.
Running
diff file1.txt file2.txt
will produce output like this:12c2 2< This is line 2. 3--- 4> This is a modified line 2. 54a4 6> This is a new line.
2c2
: This means line 2 infile1.txt
is changed to line 2 infile2.txt
.< This is line 2.
: This shows the line fromfile1.txt
that was changed. The<
symbol indicates it's from the first file.---
: A separator.> This is a modified line 2.
: This shows the line fromfile2.txt
that is the replacement. The>
symbol indicates it's from the second file.4a4
: This means a line was added after line 4 infile1.txt
to create line 4 infile2.txt
.> This is a new line.
: This is the line that was added, and again, the>
symbol denotesfile2.txt
.
-
Understanding the Symbols:
a
(add): Lines were added to the second file (file2.txt
)c
(change): Lines were changed between the two files.d
(delete): Lines were deleted from the first file (file1.txt
).
The numbers before the letters indicate the line numbers in the respective files where the change occurred. The general format is
line1,line2c/d/a line3,line4
line1,line2
represent the range of lines in the first file. If only one line is affected, onlyline1
will appear.line3,line4
represent the range of lines in the second file. If only one line is affected, onlyline3
will appear.
-
Another Example with Deletion:
file3.txt
:1This is line 1. 2This is line 2. 3This is line 3. 4This is line 4.
file4.txt
:1This is line 1. 2This is line 3. 3This is line 4.
diff file3.txt file4.txt
produces:12d1 2< This is line 2.
2d1
: This means line 2 was deleted fromfile3.txt
to createfile4.txt
.< This is line 2.
: This is the line that was deleted, taken fromfile3.txt
.
-
Side-by-Side Output (
-y
or--side-by-side
):This option makes the output easier to read.
diff -y file1.txt file2.txt
1This is line 1. This is line 1. 2This is line 2. | This is a modified line 2. 3This is line 3. This is line 3. 4 > This is a new line. 5This is line 4. This is line 5. 6This is line 5.
|
indicates a changed line.<
indicates a line only in the first file.>
indicates a line only in the second file.
-
Unified Diff Format (
-u
):This format is commonly used for patches because it provides more context. It shows several lines around the changes, making it easier to understand the context of the modification.
diff -u file1.txt file2.txt
1--- file1.txt 2+++ file2.txt 3@@ -1,5 +1,5 @@ 4 This is line 1. 5-This is line 2. 6+This is a modified line 2. 7 This is line 3. 8+This is a new line. 9 This is line 5.
--- file1.txt
: Indicates the first file.+++ file2.txt
: Indicates the second file.@@ -1,5 +1,5 @@
: This is the "hunk header." It means: "Fromfile1.txt
, starting at line 1, show 5 lines. Infile2.txt
, starting at line 1, show 5 lines." The numbers can change based on the location and size of the change.-This is line 2.
: A line removed fromfile1.txt
.+This is a modified line 2.
: A line added tofile2.txt
. Lines without a+
or-
are context lines and are present in both files (and immediately surrounding the changes).
-
Ignoring Case (
-i
):Use
-i
to ignore case differences.diff -i file_a.txt file_b.txt
will treat "Hello" and "hello" as the same. -
Ignoring White Space (
-b
and-w
):-b
: Ignores changes in the amount of white space. Treats multiple spaces as a single space.-w
: Ignores all white space. This can be useful when comparing code that has been reformatted.
-
Comparing Directories (
-r
):The
-r
option allows you to recursively compare directories.diff -r dir1 dir2
will compare all files indir1
anddir2
, including files in subdirectories. It will print out the diffs for each file that differs. -
Creating a Patch File:
You can save the output of
diff
into a "patch" file. This file can then be used to apply the changes to the original file (using thepatch
command). This is commonly used for distributing code changes.diff -u file1.txt file2.txt > my_patch.patch
Then, to apply the patch to
file1.txt
:patch file1.txt < my_patch.patch
Conclusion
diff
is a versatile tool for identifying and displaying differences between
files. Understanding its various options allows you to tailor the output to
your specific needs, whether you're comparing code, configuration files, or
text documents.
Remember to consider the context and choose the appropriate options for
optimal clarity and efficiency in your comparison tasks. Patch files
generated by diff
are crucial for distributing changes and updating files
efficiently.