Uniword converts OOXML documents to MHTML format for Word 2003+ compatibility. The conversion pipeline transforms the OOXML object model into a MIME multipart HTML document.
1. Conversion Steps
-
Extract content — The OOXML document model is traversed to extract paragraphs, tables, images, and formatting
-
Generate HTML — Document elements are converted to HTML with Word-compatible markup
-
Generate CSS — OOXML styles and themes are translated to CSS rules
-
Encode images — Binary image data from the DOCX package is encoded as Base64
-
Build MIME structure — HTML, CSS, and embedded resources are wrapped in a MIME multipart/related structure
2. MIME Structure
The MHTML output follows this structure:
MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_boundary"
------=_boundary
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<style><!-- CSS rules --></style>
</head>
<body>
<!-- HTML content -->
<img src="cid:image001">
</body>
</html>
------=_boundary
Content-Type: image/png
Content-Transfer-Encoding: base64
Content-ID: <image001>
iVBORw0KGgo...base64 data...
------=_boundary--
3. CSS Generation
OOXML formatting is translated to CSS:
- Paragraph formatting
-
spacing,indentation,alignmentbecome CSS margin, padding, and text-align properties. - Character formatting
-
bold,italic,font-size,colorbecome CSS font and color properties. - Table formatting
-
borders,shading,cell-paddingbecome CSS table properties. - Theme colors
-
Resolved to RGB hex values before CSS generation.