Uniword uses a pure object-oriented approach where each XML file in the DOCX ZIP package is represented by a dedicated lutaml-model class. This eliminates the serialization/deserialization anti-pattern and provides perfect round-trip fidelity.
1. Package Class
The Docx::Package class is the top-level container for all parts of a DOCX file:
class Package < Lutaml::Model::Serializable
# Metadata (fully modeled)
attribute :core_properties, CoreProperties # docProps/core.xml
attribute :app_properties, AppProperties # docProps/app.xml
# Theme (fully modeled)
attribute :theme, Theme # word/theme/theme1.xml
# Document content (in progress)
attribute :document, Document # word/document.xml
attribute :styles, StylesConfiguration # word/styles.xml
# ... other parts
def self.from_file(path)
# Load DOCX and deserialize all parts
end
def to_file(path)
# Serialize all parts and package as DOCX
end
end
2. Key Attributes
Each attribute maps directly to an XML part inside the DOCX ZIP:
| Attribute | XML Part | Description |
|---|---|---|
|
|
Dublin Core metadata (title, author, dates) |
|
|
Application metadata (pages, words, characters) |
|
|
Theme definition (colors, fonts, formatting) |
|
|
Main document body (paragraphs, tables, sections) |
|
|
Style definitions (paragraph, character, table) |
3. Loading and Saving
# Load an existing DOCX file
package = Docx::Package.from_file('document.docx')
# Access document content
paragraphs = package.document.body.paragraphs
# Modify content
package.document.body.add_paragraph("New paragraph")
# Save back to file
package.to_file('modified.docx')
3.1. Profile-Driven Saving
Pass a Profile to populate DOCX parts with Word-expected content:
profile = Uniword::Docx::Profile.load(:word_2024_en)
package.profile = profile
package.to_file('output.docx')
4. Module Architecture
The Package class is decomposed into focused modules following the single responsibility principle:
PackageDefaults-
Factory methods for constructing minimal DOCX packages (
minimal_content_types,minimal_package_rels,minimal_document_rels). Provides theDOCUMENT_TO_PACKAGE_MAPPINGShash for open/closed copying of document parts. PackageSerialization-
Content type injection and XML serialization for all package parts. Handles image, chart, header, footer, footnote, bibliography, and custom XML injection. Separates the "what to write" phase from the "how to write it" phase.
This decomposition keeps Package itself focused on loading and delegation (~390 lines), while each module handles one responsibility.
5. Benefits
- Zero hardcoding
-
All XML generation is handled by lutaml-model, not string concatenation.
- Type safety
-
Strong typing for all attributes means type errors are caught early.
- Perfect round-trip
-
Model serialization guarantees that all content is preserved during load/save cycles.
- Easy testing
-
Each model class is independently testable without needing a full DOCX package.
- Maintainability
-
Changes to OOXML handling are isolated to model definitions, not scattered across the codebase.