Using diff

Jan 10, 2025 linux diff

Overview

While revising for some Kubernetes exams, I realised I havent done a blog post on some useful commands.

The second blog post covers the diff command.

The diff command in Linux is a powerful utility used to compare files line by line. It identifies the differences between two files and outputs these changes in a format that can be used to patch one file to become identical to the other. It is a fundamental tool for software development, system administration, and document management where tracking changes is important.

If you use git alot - the content will be pleasantly familiar to you :-)

Creating a diff patch

To create a diff file (patch), use the diff command. The basic syntax is:

1diff [options] file1 file2 > patchfile.patch

file1: The original file.
file2: The modified file.
patchfile.patch: The name of the file where the differences will be stored. This file can be named anything, but .patch is a common convention.

Example:

1diff original.txt modified.txt > mychanges.patch

This command compares original.txt and modified.txt and saves the differences in a file named mychanges.patch.

Important diff options for creation:

-u or -U NUM: Creates a unified diff. This is the most common and recommended format. NUM specifies the number of context lines to include around the changes. Using -u without a number defaults to 3 lines. Unified diffs are much easier to read and apply.
```
1diff -u original.txt modified.txt > mychanges.patch
2diff -U5 original.txt modified.txt > mychanges.patch # Includes 5 lines of context
```

-N or --new-file: Treats absent files as empty. Useful when creating new files.

1diff -uN original.txt newfile.txt > mychanges.patch #newfile.txt didn't exist previously

-r: Recursively compare directories.

1diff -ur directory1 directory2 > directory_changes.patch

--exclude=PATTERN: Exclude files matching PATTERN when comparing directories recursively. Useful to ignore auto-generated files.
```
1diff -ur --exclude=*.o directory1 directory2 > directory_changes.patch #ignores .o files
```

-i: Ignore case differences.

1diff -ui original.txt modified.txt > mychanges.patch

-w: Ignore whitespace changes.

1diff -uw original.txt modified.txt > mychanges.patch

-b: Ignore changes in the amount of whitespace.

1diff -ub original.txt modified.txt > mychanges.patch

--ignore-all-space: Ignore all whitespace. Equivalent to -w.

1diff -u --ignore-all-space original.txt modified.txt > mychanges.patch

--suppress-common-lines: Do not print common lines. Makes diffs smaller and focuses only on the differences. Use with caution, as it can make patches harder to understand in isolation.
```
1diff -u --suppress-common-lines original.txt modified.txt > mychanges.patch
```

Choose the options based on the context of the changes and how the patch will be used. For general use, -u (unified diff) is almost always the best choice.

When dealing with directory trees, -ur is crucial. When sharing code for review, include enough context lines (-U) for the reviewer to understand the changes, but not so many that the diff becomes bloated.

Diff example walkthrough:

Basic File Comparison: Let's say you have two files, file1.txt and file2.txt.

file1.txt:
```
1This is line 1.
2This is line 2.
3This is line 3.
4This is line 4.
5This is line 5.
```
file2.txt:
```
1This is line 1.
2This is a modified line 2.
3This is line 3.
4This is a new line.
5This is line 5.
```
Running diff file1.txt file2.txt will produce output like this:
```
12c2
2< This is line 2.
3---
4> This is a modified line 2.
54a4
6> This is a new line.
```
- 2c2: This means line 2 in file1.txt is changed to line 2 in file2.txt.
- < This is line 2.: This shows the line from file1.txt that was changed. The < symbol indicates it's from the first file.
- ---: A separator.
- > This is a modified line 2.: This shows the line from file2.txt that is the replacement. The > symbol indicates it's from the second file.
- 4a4: This means a line was added after line 4 in file1.txt to create line 4 in file2.txt.
- > This is a new line.: This is the line that was added, and again, the > symbol denotes file2.txt.
Understanding the Symbols:
- a (add): Lines were added to the second file (file2.txt)
- c (change): Lines were changed between the two files.
- d (delete): Lines were deleted from the first file (file1.txt).
The numbers before the letters indicate the line numbers in the respective files where the change occurred. The general format is line1,line2c/d/a line3,line4
- line1,line2 represent the range of lines in the first file. If only one line is affected, only line1 will appear.
- line3,line4 represent the range of lines in the second file. If only one line is affected, only line3 will appear.
Another Example with Deletion:

file3.txt:
```
1This is line 1.
2This is line 2.
3This is line 3.
4This is line 4.
```
file4.txt:
```
1This is line 1.
2This is line 3.
3This is line 4.
```
diff file3.txt file4.txt produces:
```
12d1
2< This is line 2.
```
- 2d1: This means line 2 was deleted from file3.txt to create file4.txt.
- < This is line 2.: This is the line that was deleted, taken from file3.txt.

Side-by-Side Output (-y or --side-by-side):

This option makes the output easier to read.

diff -y file1.txt file2.txt

1This is line 1.                          This is line 1.
2This is line 2.                        | This is a modified line 2.
3This is line 3.                          This is line 3.
4                                            > This is a new line.
5This is line 4.                          This is line 5.
6This is line 5.

| indicates a changed line.
< indicates a line only in the first file.
> indicates a line only in the second file.

Unified Diff Format (-u):

This format is commonly used for patches because it provides more context. It shows several lines around the changes, making it easier to understand the context of the modification.

diff -u file1.txt file2.txt
```
1--- file1.txt
2+++ file2.txt
3@@ -1,5 +1,5 @@
4 This is line 1.
5-This is line 2.
6+This is a modified line 2.
7 This is line 3.
8+This is a new line.
9 This is line 5.
```
- --- file1.txt: Indicates the first file.
- +++ file2.txt: Indicates the second file.
- @@ -1,5 +1,5 @@: This is the "hunk header." It means: "From file1.txt, starting at line 1, show 5 lines. In file2.txt, starting at line 1, show 5 lines." The numbers can change based on the location and size of the change.
- -This is line 2.: A line removed from file1.txt.
- +This is a modified line 2.: A line added to file2.txt. Lines without a + or - are context lines and are present in both files (and immediately surrounding the changes).
Ignoring Case (-i):

Use -i to ignore case differences. diff -i file_a.txt file_b.txt will treat "Hello" and "hello" as the same.
Ignoring White Space (-b and -w):
- -b: Ignores changes in the amount of white space. Treats multiple spaces as a single space.
- -w: Ignores all white space. This can be useful when comparing code that has been reformatted.
Comparing Directories (-r):

The -r option allows you to recursively compare directories. diff -r dir1 dir2 will compare all files in dir1 and dir2, including files in subdirectories. It will print out the diffs for each file that differs.
Creating a Patch File:

You can save the output of diff into a "patch" file. This file can then be used to apply the changes to the original file (using the patch command). This is commonly used for distributing code changes.

diff -u file1.txt file2.txt > my_patch.patch

Then, to apply the patch to file1.txt:

patch file1.txt < my_patch.patch

Conclusion

diff is a versatile tool for identifying and displaying differences between files. Understanding its various options allows you to tailor the output to your specific needs, whether you're comparing code, configuration files, or text documents.

Remember to consider the context and choose the appropriate options for optimal clarity and efficiency in your comparison tasks. Patch files generated by diff are crucial for distributing changes and updating files efficiently.