This page will contain images about Diff, as they become available.

Diff

In computing, diff is a file comparison utility that outputs the differences between two text files. The program's output is also called a diff.

Usage

It is invoked from the command line with the names of two files:

diff firstone.txt secondone.txt

Normal output

The result might look like this:

0a1,3
> This is an important notice! It should
> therefore be located at the beginning of
> this document!
7,12d9
< This paragraph contains text that is
< outdated - it will be deprecated and
< deleted in the near future.
< This is an important notice! It should
< therefore be located at the beginning of
< this document!
14,15c11,14
< spell check this dokument. On the other
< hand, I could do with some shoarma.
---
> spell check this document. On the other
> hand, I could do with some shoarma.
> This paragraph contains important new
> additions to this document.

In this normal diff output, a stands for added, d for deleted and c for changed. By default, lines common to both files are not shown. Lines that have moved will show up as added on their new location and as deleted on their old location.

Unified format

In unified format (or unidiff), each line that occurs only in the first file is preceded by a minus sign, each line that occurs only in the second file is preceded by a plus sign, and common lines are preceded by a space.

Lines beginning with three plus signs indicate the number of lines in each hunk, the file names, and where in the files to find them. This output is often used as input to the patch program.

Binary file support

The first editions of the diff program were designed for line comparisons of text files expecting the newline character to delimit lines. By the 1980s, support for binary files resulted in a shift in the application's design and implementation.

History

The diff program was developed in the early 1970s on the Unix operating system which was emerging from AT&T Bell Labs in Murray Hill, New Jersey. The final version, first shipped with the 5th Edition of Unix in 1974, was entirely written by Douglas McIlroy. This research was published in a 1976 paper co-written with James W. Hunt who developed an initial prototype of diff.

McIlroy's work was preceded and influenced by Steve Johnson's comparison program on GECOS and Mike Lesk's proof program. Proof originated on Unix and produced line-by-line changes like diff and even used angle-brackets (">" and "<") for presenting line insertions and deletions in the program's output. The heuristics used in these early applications were, however, deemed unreliable. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks but perform well in the processing and space limitations of the PDP-11's hardware. His approach resulted from collaboration also with individuals at Bell Labs including Alfred Aho, Elliot Pinson, Jeffrey Ullman, and Harold S. Stone.

In the context of Unix, the use of the ed line editor provided diff with the natural ability to create machine-usable "edit scripts". These edit scripts, when saved to a file, can, along with the original file, be reconstituted by ed into the modified file in its entirety. This greatly reduced the space necessary to maintain multiple versions of a file. McIlroy considered writing a post-processor for diff where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have diff be responsible for generating the syntax and reverse-order input accepted by the ed command. In 1985, Larry Wall composed a separate utility, patch, that generalized and extended the ability to modify files with diff output.

In diff's early years, common uses included comparing changes in programming language source code, source to technical documents, verifying program debugging output, comparing filesystem listings and analyzing computer assembly code. The output targeted for ed was motivated to provide compression for a sequence of modifications made to a file. The Source Code Control System (SCCS) emerged in the late 1970s as a direct consequence of this development.

A conceptual predecessor of diff includes Project Xanadu, a hypertext project established in 1960 that had envisioned a version tracking system necessary for its "transpointing windows" feature. As part of this feature, file differences were subsumed in the expansive term "transclusion", when a document has included in it parts of other documents or revisions.

In the digital realm of the humanities, computer comparison systems were understood to have been created for working on literary works published as large volumes.

Variations

Most diff implementations remain outwardly unchanged since 1975. The modifications include improvements to the core algorithm, the addition of useful features to the command, and the design of new output formats. The basic algorithm is described in the papers An O(ND) Difference Algorithm and its Variations by Eugene W. Myers and in A File Comparison Program by Webb Miller and Myers. The algorithm was independently discovered as described in Algorithms for Approximate String Matching, E. Ukkonen.

Postprocessors sdiff and diffmk render side-by-side diff listings and applied change marks to printed documents, respectively. Both were developed elsewhere in Bell Labs in or before 1981.

The Berkeley distribution of Unix made a point of adding the context format (-C) and the ability to recurse on filesystem directory structures (-r), adding those features in 2.8 BSD, released in July 1981.

The context format of diff introduced at Berkeley helped with distributing patches for source code that may have been changed minimally.

Diff3 compares one file against two other files. It was originally developed by Paul Jensen to reconcile changes made by two persons editing a common source. It is seldom invoked directly and is largely subsumed by the merge program. However, it is used internally by many revision control systems.

Unified context diffs were originally developed by Wayne Davison in August 1990 (in unidiff which appeared in Volume 14 of comp.sources.misc). Richard Stallman added unified diff support to GNU Project's diff utility one month later, and the feature debuted in GNU diff 1.15, released in January 1991. GNU diff has since generalized the context format to allow arbitrary formatting of diffs. GNU diff is included in the diffutils package with other diff and patch related utilities.

Free software implementations

The GNU Project has an implementation of diff (and diff3) that is available from the GNU diffutils package.

Several tools on various platforms use the GNU diffutils engine and provide a graphical display, and some combine editing and merging capabilities. The following are some of these free tools.

  • Emacs - provided by Ediff mode
  • VimDiff [1]
  • gtkdiff [2]
  • KDiff3 [3]
  • kompare
  • Meld
  • tkdiff [4]
  • WinMerge - Comparison tool for Windows.
  • xxdiff [5]
  • fldiff [6]

This page about Diff includes information from a Wikipedia article.
Additional articles about Diff
News stories about Diff
External links for Diff
Videos for Diff
Wikis about Diff
Discussion Groups about Diff
Blogs about Diff
Images of Diff

The following are some of these free tools. However, ed was never modified, so he could count on making it work without making himself look like a fool. Several tools on various platforms use the GNU diffutils engine and provide a graphical display, and some combine editing and merging capabilities. Although vi was almost ubiquitous, he could not count on the local version working the way he expected. The GNU Project has an implementation of diff (and diff3) that is available from the GNU diffutils package. In a 1985 interview Bill Joy explained that, at Sun, he used an early desktop publishing program, called interleaf; when visiting labs outside Sun, he used plain old ed. GNU diff is included in the diffutils package with other diff and patch related utilities. In the editor wars, emacs proponents used to say, "even Bill Joy doesn't use vi anymore.".

GNU diff has since generalized the context format to allow arbitrary formatting of diffs. q will end our ed session. Richard Stallman added unified diff support to GNU Project's diff utility one month later, and the feature debuted in GNU diff 1.15, released in January 1991. ed responds with 65, which is the number of characters that it wrote to the file. Unified context diffs were originally developed by Wayne Davison in August 1990 (in unidiff which appeared in Volume 14 of comp.sources.misc). w text writes the buffer to the file "text". However, it is used internally by many revision control systems. Listing all lines again with 1,$l we see that the line is correct now.

It is seldom invoked directly and is largely subsumed by the merge program. The 3 will apply it to the right line, following the command is the text to be replaced, and then the replacement. It was originally developed by Paul Jensen to reconcile changes made by two persons editing a common source. We will correct the error in line 3 with 3s/two/three/, a substitution command. Diff3 compares one file against two other files. These lines are ended with dollar signs, so that white space at the end of lines is clearly visible. The context format of diff introduced at Berkeley helped with distributing patches for source code that may have been changed minimally. In return, ed is listing all lines, from first to last.

The Berkeley distribution of Unix made a point of adding the context format (-C) and the ability to recurse on filesystem directory structures (-r), adding those features in 2.8 BSD, released in July 1981. This time we prefixed the command by a range, two lines separated by a comma ($ means the last line). Both were developed elsewhere in Bell Labs in or before 1981. In 1,$l the l stands for the list command. Postprocessors sdiff and diffmk render side-by-side diff listings and applied change marks to printed documents, respectively. All commands may be prefixed by a line number and will operate on that line. Ukkonen. 2i goes into insert mode, and will insert the entered text (a single empty line in our case) before line two.

The algorithm was independently discovered as described in Algorithms for Approximate String Matching, E. The two lines that we entered before the dot end up in the file buffer. Myers and in A File Comparison Program by Webb Miller and Myers. That put us into insert mode, which is terminated by a singular dot on a line. The basic algorithm is described in the papers An O(ND) Difference Algorithm and its Variations by Eugene W. Here we started with an empty file, and used a to append text (all ed commands are single letters). The modifications include improvements to the core algorithm, the addition of useful features to the command, and the design of new output formats. The end result is a simple text file containing the following text:.

Most diff implementations remain outwardly unchanged since 1975. Here is an example transcript of an ed session:. In the digital realm of the humanities, computer comparison systems were understood to have been created for working on literary works published as large volumes. . As part of this feature, file differences were subsumed in the expansive term "transclusion", when a document has included in it parts of other documents or revisions. These editors, however, are typically more limited in function. A conceptual predecessor of diff includes Project Xanadu, a hypertext project established in 1960 that had envisioned a version tracking system necessary for its "transpointing windows" feature. For example, EDLIN in early MS-DOS versions had somewhat similar syntax, and text editors in many MUDs (LPMud and descendants, for example) use ed-like syntax.

The Source Code Control System (SCCS) emerged in the late 1970s as a direct consequence of this development. The ed commands are often imitated in other line-based editors. The output targeted for ed was motivated to provide compression for a sequence of modifications made to a file. This is often the only time when it is used interactively, aside maybe from torturing first year students. In diff's early years, common uses included comparing changes in programming language source code, source to technical documents, verifying program debugging output, comparing filesystem listings and analyzing computer assembly code. If something goes wrong, and the OS is somehow not fully loaded, ed is sometimes the only editor available. In 1985, Larry Wall composed a separate utility, patch, that generalized and extended the ability to modify files with diff output. ed can be found on virtually every version of Unix and Linux available; people who have to work with multiple versions of Unix often know at least the basic ed commands.

McIlroy considered writing a post-processor for diff where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have diff be responsible for generating the syntax and reverse-order input accepted by the ed command. For interactive use, ed was subsumed by sam, vi and Emacs editors in the 1980s. This greatly reduced the space necessary to maintain multiple versions of a file. In current practice ed is rarely used interactively, but it does find use in some shell scripts. These edit scripts, when saved to a file, can, along with the original file, be reconstituted by ed into the modified file in its entirety. This terseness was appropriate in the early versions of Unix, when consoles were teletypes, modems were slow, and hard disk and memory were precious, but these advantages ceased to apply when more interactive editors became the standards. In the context of Unix, the use of the ed line editor provided diff with the natural ability to create machine-usable "edit scripts". It does not report the current filename or line number, or even display the results of a change to the text, unless requested.

Stone. For example, the message that ed will produce in case of error, or when it wants to make sure you want to quit without saving, is "?". His approach resulted from collaboration also with individuals at Bell Labs including Alfred Aho, Elliot Pinson, Jeffrey Ullman, and Harold S. Famous for its terseness, ed has almost no visual feedback. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks but perform well in the processing and space limitations of the PDP-11's hardware. The non-interactive Unix commands grep and sed were inspired by common special uses of ed; their influence is visible in the design of the programming language AWK, which in turn inspired aspects of Perl. The heuristics used in these early applications were, however, deemed unreliable. ed went on to influence ex, which in turn spawned vi.

Proof originated on Unix and produced line-by-line changes like diff and even used angle-brackets (">" and "<") for presenting line insertions and deletions in the program's output. ed was influenced by an earlier editor known as QED from University of California at Berkeley, where Ken Thompson had graduated from. McIlroy's work was preceded and influenced by Steve Johnson's comparison program on GECOS and Mike Lesk's proof program. Prior to that implementation, the concept of regular expressions was only formalized in a mathematical paper, which Ken Thompson had read. Hunt who developed an initial prototype of diff. ed was originally written by Ken Thompson and he implemented regular expression in ed for the first time. This research was published in a 1976 paper co-written with James W. The text editor ed was the original standard on the Unix operating system.

The final version, first shipped with the 5th Edition of Unix in 1974, was entirely written by Douglas McIlroy. The diff program was developed in the early 1970s on the Unix operating system which was emerging from AT&T Bell Labs in Murray Hill, New Jersey. By the 1980s, support for binary files resulted in a shift in the application's design and implementation. The first editions of the diff program were designed for line comparisons of text files expecting the newline character to delimit lines.

This output is often used as input to the patch program. Lines beginning with three plus signs indicate the number of lines in each hunk, the file names, and where in the files to find them. In unified format (or unidiff), each line that occurs only in the first file is preceded by a minus sign, each line that occurs only in the second file is preceded by a plus sign, and common lines are preceded by a space. Lines that have moved will show up as added on their new location and as deleted on their old location.

By default, lines common to both files are not shown. In this normal diff output, a stands for added, d for deleted and c for changed. The result might look like this:. It is invoked from the command line with the names of two files:.

. The program's output is also called a diff. In computing, diff is a file comparison utility that outputs the differences between two text files. fldiff [6].

xxdiff [5]. WinMerge - Comparison tool for Windows. tkdiff [4]. Meld.

kompare. KDiff3 [3]. gtkdiff [2]. VimDiff [1].

Emacs - provided by Ediff mode.