Compare two DOCX files and display their differences.
Uniword provides two diff modes that operate at different levels of abstraction:
- Document-level diff (
compare) -
Compares the rendered content of two documents — text, formatting, structure, metadata, and styles. This is what you want when reviewing edits between two versions of a document.
- Package-level diff (
package) -
Compares the raw DOCX structure — ZIP parts, XML content, ZIP entry metadata, and OPC packaging compliance. This is what you want when debugging DOCX generation, investigating Word repair behavior, or auditing byte-level changes between two files.
1. When to use which
| Mode | Best for | Example |
|---|---|---|
|
Content review, editorial changes, style drift |
Comparing draft v1 vs v2 of a report |
|
DOCX generation debugging, Word repair analysis, structural auditing |
Investigating why Word flags "Summary Info 1 is invalid" |
2. compare — Document-level comparison
2.2. Options
--text-only-
Compare text only, skip formatting comparison.
--json-
Output differences as JSON.
--part-
Focus on a specific part:
content,styles, orheaders. --verbose,-v-
Show full text in change listings.
2.3. Examples
# Basic comparison
uniword diff compare old.docx new.docx
# Text-only (ignore formatting)
uniword diff compare old.docx new.docx --text-only
# JSON output for programmatic processing
uniword diff compare old.docx new.docx --json
# Focus on styles
uniword diff compare old.docx new.docx --part styles --verbose
2.4. What it compares
The document-level diff compares five dimensions:
- Text changes
-
Added, removed, or modified paragraph text, aligned using LCS (Longest Common Subsequence) matching.
- Format changes
-
Paragraph-level (alignment, style) and run-level (bold, italic, font, size, color) formatting differences.
- Structure changes
-
Paragraph and table count differences.
- Metadata changes
-
Title, author, subject, keywords, description, dates.
- Style changes
-
Added, removed, or modified style definitions.
3. package — Package-level comparison
3.2. Options
--canon-
Use the Canon library for semantic XML comparison. Without this flag, identical XML with different whitespace or attribute order is reported as "different". With
--canon, only semantically meaningful changes are flagged. --json-
Output differences as JSON.
--verbose,-v-
Show XML change details and ZIP metadata differences.
3.3. Examples
# Basic structural diff
uniword diff package generated.docx word-repaired.docx
# With Canon semantic comparison (recommended)
uniword diff package generated.docx word-repaired.docx --canon
# Full details
uniword diff package generated.docx word-repaired.docx --canon --verbose
# JSON output
uniword diff package generated.docx word-repaired.docx --canon --json
3.4. What it compares
The package-level diff compares four dimensions:
- ZIP parts (added/removed/modified)
-
Lists which ZIP entries were added, removed, or changed. Reports size deltas for each modified part.
- ZIP metadata
-
For each modified entry, reports differences in:
-
Compression method (stored vs deflated)
-
Internal attributes (text vs binary flag)
-
Timestamps
-
- OPC validation
-
Checks Open Packaging Convention compliance for both files:
-
Required parts exist (
[Content_Types].xml,_rels/.rels,word/document.xml) -
Content type overrides match existing parts
-
Relationships point to existing parts
-
- Canon semantic comparison (with
--canon) -
Uses the Canon XML comparison library to determine whether XML parts are semantically equivalent. Parts that differ only in whitespace, attribute order, or formatting are reported as
[canon: equivalent]. Parts with real content differences are reported as[canon: DIFFERENT]with a summary of what changed.
3.5. Output format
Without --json, the package diff outputs a structured report:
Package diff: generated.docx -> word-repaired.docx
11 part(s) modified
Modified parts:
~ word/document.xml (3093 -> 3016 bytes, -77) [canon: DIFFERENT]
canon: attribute_values: /#document[0]/document[0]/body[0]/p[0] ...
~ word/webSettings.xml (1017 -> 1018 bytes, +1) [canon: equivalent]
ZIP metadata differences:
word/document.xml: internal_attr text -> binary
word/document.xml: timestamp 2026-04-24 ... -> 1980-01-01 ...
With --verbose, XML change details (namespace, attribute, element count
differences) are shown for each modified part.
3.6. Ruby API
require "uniword/diff/package_differ"
# Basic package diff
differ = Uniword::Diff::PackageDiffer.new("old.docx", "new.docx")
result = differ.diff
puts result.summary
puts result.modified_parts.map { |p| "#{p.name}: #{p.size_delta} bytes" }
# With Canon semantic comparison
differ = Uniword::Diff::PackageDiffer.new("old.docx", "new.docx",
canon: true)
result = differ.diff
result.modified_parts.each do |part|
status = part.canon_equivalent ? "equivalent" : "DIFFERENT"
puts "#{part.name}: canon #{status}"
puts " #{part.canon_summary}" if part.canon_equivalent == false
end
# JSON output
puts result.to_json
3.7. Result object
The PackageDiffResult returned by diff provides:
| Method | Description | Type |
|---|---|---|
|
ZIP entries present only in the new file |
|
|
ZIP entries present only in the old file |
|
|
Changed parts with size and Canon info |
|
|
Identical ZIP entry names |
|
|
Namespace, attribute, element count changes |
|
|
Compression, text/binary, timestamp diffs |
|
|
OPC validation problems |
|
|
Human-readable one-line summary |
|
|
True if no differences found |
|
|
JSON serialization |
|
|
Hash serialization |
|
Each PartChange has:
| Attribute | Description |
|---|---|
|
ZIP entry path (e.g. |
|
Byte sizes in old and new files |
|
|
|
|
|
Human-readable Canon diff summary (nil if equivalent or Canon not used) |
|
Array of |