Uniword supports MHTML (MIME HTML) format for Word 2003+ compatibility. MHTML files encode an entire Word document — including text, formatting, and embedded resources — into a single MIME multipart file.
1. MIME Structure
An MHTML file is a MIME multipart document with the following structure:
-
MIME headers — Content-Type and boundary definitions
-
HTML part — The document content rendered as HTML with Word-specific CSS
-
Embedded resources — Images and other binary data encoded as Base64 within MIME parts
The format uses multipart/related MIME type with each part referenced by Content-ID (CID) URIs.
2. CSS Generation
Uniword generates Word-compatible CSS from document styles:
-
Paragraph styles are translated to CSS block-level properties
-
Character styles become inline CSS or
<span>styles -
Table styles map to CSS table formatting
-
Theme colors and fonts are resolved to CSS values
3. DOCX to MHTML Conversion
# Load a DOCX file and convert to MHTML
doc = Uniword.load('document.docx')
doc.save_as('document.mhtml')
# Or use the format handler directly
doc.save('document.mhtml', format: :mhtml)
The conversion process:
-
Parses the DOCX package into the Document Layer
-
Generates HTML from document elements
-
Produces CSS from styles and themes
-
Encodes embedded images as Base64 MIME parts
-
Wraps everything in a MIME multipart structure
4. MHTML to DOCX Conversion
# Load an MHTML file and convert to DOCX
doc = Uniword.load('document.mhtml')
doc.save('document.docx')
The reverse conversion:
-
Parses the MIME multipart structure
-
Extracts HTML content and CSS
-
Converts HTML elements back to OOXML document parts
-
Decodes Base64 resources into image parts
-
Packages as a valid DOCX ZIP file
5. Format Auto-Detection
Uniword automatically detects the file format based on file extension and content:
-
.docxfiles are processed by the DOCX handler -
.mhtmlor.mhtfiles are processed by the MHTML handler -
Content inspection confirms the detected format
# Format is detected automatically
doc = Uniword.load('document.mhtml') # Uses MHTML handler
doc.save('output.docx') # Writes as DOCX