Convert HTML content into OOXML paragraphs using the HtmlImporter pipeline.
1. Overview
The Uniword::HtmlImporter converts HTML markup into OOXML paragraph objects
that can be added to a document. The pipeline uses HtmlToOoxmlConverter internally
to transform HTML elements into their Word equivalents.
2. Quick Start
Use the class method for one-shot conversion:
paragraphs = Uniword::HtmlImporter.import('<p>Hello World</p>')
# => Array of Uniword::Wordprocessingml::Paragraph
Add the converted paragraphs to a document:
doc = Uniword::Document.new
paragraphs = Uniword::HtmlImporter.import('<p>First paragraph</p><p>Second</p>')
paragraphs.each { |p| doc.add_element(p) }
doc.save('from_html.docx')
3. Instance-Based Import
For more control, use an instance of the importer:
importer = Uniword::HtmlImporter.new('<h1>Title</h1><p>Content here</p>')
# Convert to paragraphs
paragraphs = importer.import
# Or convert directly to a DocumentRoot
doc = importer.to_document
doc.save('full_doc.docx')
4. HTML Elements Supported
The converter maps common HTML elements to OOXML equivalents:
| HTML Element | OOXML Mapping |
|---|---|
|
Paragraph ( |
|
Bold run property |
|
Italic run property |
|
Underline run property |
|
Heading paragraphs with style |
|
Numbered/bulleted list paragraphs |
|
Table element |
|
Hyperlink |
|
Line break |
|
Subscript/superscript |
5. Pipeline Architecture
HTML String
|
v
HtmlToOoxmlConverter.html_to_paragraphs()
|
v
Array<Uniword::Wordprocessingml::Paragraph>
|
v
Add to Document or call to_document
The converter can also produce tables from HTML table markup:
tables = Uniword::Transformation::HtmlToOoxmlConverter.html_to_tables(
'<table><tr><td>A</td><td>B</td></tr></table>'
)