Module Diff::LCS
In: lib/diff/lcs.rb
lib/diff/lcs/callbacks.rb

Diff::LCS 1.2.1

Computes "intelligent" differences between two sequenced Enumerables. This is an implementation of the McIlroy-Hunt "diff" algorithm for Enumerable objects that include Diffable.

Based on Mario I. Wolczko‘s Smalltalk version (1.2, 1993) and Ned Konz‘s Perl version (Algorithm::Diff 1.15).

Synopsis

  require 'diff/lcs'

  seq1 = %w(a b c e h j l m n p)
  seq2 = %w(b c d e f j k l m r s t)

  lcs = Diff::LCS.lcs(seq1, seq2)
  diffs = Diff::LCS.diff(seq1, seq2)
  sdiff = Diff::LCS.sdiff(seq1, seq2)
  seq = Diff::LCS.traverse_sequences(seq1, seq2, callback_obj)
  bal = Diff::LCS.traverse_balanced(seq1, seq2, callback_obj)
  seq2 == Diff::LCS.patch(seq1, diffs)
  seq2 == Diff::LCS.patch!(seq1, diffs)
  seq1 == Diff::LCS.unpatch(seq2, diffs)
  seq1 == Diff::LCS.unpatch!(seq2, diffs)
  seq2 == Diff::LCS.patch(seq1, sdiff)
  seq2 == Diff::LCS.patch!(seq1, sdiff)
  seq1 == Diff::LCS.unpatch(seq2, sdiff)
  seq1 == Diff::LCS.unpatch!(seq2, sdiff)

Alternatively, objects can be extended with Diff::LCS:

  seq1.extend(Diff::LCS)
  lcs = seq1.lcs(seq2)
  diffs = seq1.diff(seq2)
  sdiff = seq1.sdiff(seq2)
  seq = seq1.traverse_sequences(seq2, callback_obj)
  bal = seq1.traverse_balanced(seq2, callback_obj)
  seq2 == seq1.patch(diffs)
  seq2 == seq1.patch!(diffs)
  seq1 == seq2.unpatch(diffs)
  seq1 == seq2.unpatch!(diffs)
  seq2 == seq1.patch(sdiff)
  seq2 == seq1.patch!(sdiff)
  seq1 == seq2.unpatch(sdiff)
  seq1 == seq2.unpatch!(sdiff)

Default extensions are provided for Array and String objects through the use of ‘diff/lcs/array’ and ‘diff/lcs/string’.

Introduction (by Mark-Jason Dominus)

The following text is from the Perl documentation. The only changes have been to make the text appear better in Rdoc.

I once read an article written by the authors of diff; they said that they hard worked very hard on the algorithm until they found the right one.

I think what they ended up using (and I hope someone will correct me, because I am not very confident about this) was the `longest common subsequence’ method. In the LCS problem, you have two sequences of items:

   a b c d f g h j q z
   a b c d e f g i j k r x y z

and you want to find the longest sequence of items that is present in both original sequences in the same order. That is, you want to find a new sequence S which can be obtained from the first sequence by deleting some items, and from the second sequence by deleting other items. You also want S to be as long as possible. In this case S is:

   a b c d f g j z

From there it‘s only a small step to get diff-like output:

   e   h i   k   q r x y
   +   - +   +   - + + +

This module solves the LCS problem. It also includes a canned function to generate diff-like output.

It might seem from the example above that the LCS of two sequences is always pretty obvious, but that‘s not always the case, especially when the two sequences have many repeated elements. For example, consider

   a x b y c z p d q
   a b c a x b y c z

A naive approach might start by matching up the a and b that appear at the beginning of each sequence, like this:

   a x b y c         z p d q
   a   b   c a b y c z

This finds the common subsequence +a b c z+. But actually, the LCS is +a x b y c z+:

         a x b y c z p d q
   a b c a x b y c z

Author

This version is by Austin Ziegler <austin@rubyforge.org>.

It is based on the Perl Algorithm::Diff (1.15) by Ned Konz , copyright &copy; 2000&ndash;2002 and the Smalltalk diff version by Mario I. Wolczko, copyright &copy; 1993. Documentation includes work by Mark-Jason Dominus.

Licence

Copyright &copy; 2004&ndash;2013 Austin Ziegler This program is free software; you can redistribute it and/or modify it under the same terms as Ruby, or alternatively under the Perl Artistic licence.

Credits

Much of the documentation is taken directly from the Perl Algorithm::Diff implementation and was written originally by Mark-Jason Dominus and later by Ned Konz. The basic Ruby implementation was re-ported from the Smalltalk implementation, available at st.cs.uiuc.edu/pub/Smalltalk/MANCHESTER/manchester/4.0/diff.st

sdiff and traverse_balanced were written for the Perl version by Mike Schilli <m@perlmeister.com>.

"The algorithm is described in A Fast Algorithm for Computing Longest Common Subsequences, CACM, vol.20, no.5, pp.350-353, May 1977, with a few minor improvements to improve the speed."

Methods

Classes and Modules

Module Diff::LCS::Ldiff
Class Diff::LCS::Block
Class Diff::LCS::Change
Class Diff::LCS::ContextChange
Class Diff::LCS::ContextDiffCallbacks
Class Diff::LCS::DefaultCallbacks
Class Diff::LCS::DiffCallbacks
Class Diff::LCS::HTMLDiff
Class Diff::LCS::Hunk
Class Diff::LCS::SDiffCallbacks

Constants

VERSION = '1.2.1'
SequenceCallbacks = DefaultCallbacks   An alias for DefaultCallbacks that is used in Diff::LCS#traverse_sequences.
    Diff::LCS.LCS(seq1, seq2, Diff::LCS::SequenceCallbacks)
BalancedCallbacks = DefaultCallbacks   An alias for DefaultCallbacks that is used in Diff::LCS#traverse_balanced.
    Diff::LCS.LCS(seq1, seq2, Diff::LCS::BalancedCallbacks)

Public Class methods

Public Instance methods

Returns the difference set between self and other. See Diff::LCS#diff.

Returns an Array containing the longest common subsequence(s) between self and other. See Diff::LCS#LCS.

  lcs = seq1.lcs(seq2)

Attempts to patch self with the provided patchset. A new sequence based on self and the patchset will be created. See Diff::LCS#patch. Attempts to autodiscover the direction of the patch.

Attempts to patch self with the provided patchset. A new sequence based on self and the patchset will be created. See Diff::LCS#patch. Does no patch direction autodiscovery.

Attempts to patch self with the provided patchset, using patch!. If the sequence this is used on supports replace, the value of self will be replaced. See Diff::LCS#patch. Does no patch direction autodiscovery.

Returns the balanced ("side-by-side") difference set between self and other. See Diff::LCS#sdiff.

Traverses the discovered longest common subsequences between self and other using the alternate, balanced algorithm. See Diff::LCS#traverse_balanced.

Traverses the discovered longest common subsequences between self and other. See Diff::LCS#traverse_sequences.

unpatch(patchset)

Alias for patch

Attempts to unpatch self with the provided patchset. A new sequence based on self and the patchset will be created. See Diff::LCS#unpatch. Does no patch direction autodiscovery.

Attempts to unpatch self with the provided patchset, using unpatch!. If the sequence this is used on supports replace, the value of self will be replaced. See Diff::LCS#unpatch. Does no patch direction autodiscovery.

[Validate]