Uniword provides an HTML import pipeline that converts HTML content into OOXML document parts. This enables importing web content, rich text from web editors, and HTML-formatted data into Word documents.
1. Pipeline Overview
The HTML-to-OOXML conversion follows these steps:
-
Parse HTML — The HTML input is parsed into a DOM tree
-
Map elements — HTML elements are mapped to OOXML equivalents
-
Convert styles — CSS inline styles and class-based styles are converted to OOXML formatting
-
Handle images — Base64-encoded or linked images are embedded as document parts
-
Build document — The converted elements are assembled into the document structure
2. Element Mapping
HTML elements map to OOXML elements as follows:
| HTML | OOXML |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3. Style Conversion
CSS properties are converted to OOXML formatting properties:
-
font-sizemaps to<w:sz>(half-points) -
colormaps to<w:color>(hex RGB) -
font-familymaps to<w:rFonts> -
text-alignmaps to<w:jc>(justification) -
background-colormaps to<w:shd>(shading)