Word to html template


Загрузить PDF


Загрузить PDF

Преобразовать DOC, DOCX, ODF файлы в формат HTML довольно легко, но здесь есть некоторые нюансы. Если вы хотите создать веб-страницу, которая будет быстро загружаться и правильно отображаться во всех браузерах, используйте онлайн-инструменты. Если вы хотите сохранить формат исходного документа, используйте Word.

  1. Изображение с названием Convert a Word Document to HTML Step 1

    1

    Самый быстрый и простой способ конвертирования – это скопировать и вставить документ Word в конвертер TextFixer или загрузить документ на сайт Online-Convert.com. При помощи этих бесплатных инструментов вы быстро преобразуете документ в формат HTML, но некоторые параметры форматирования документа будут утеряны.

  2. Изображение с названием Convert a Word Document to HTML Step 2

    2

    Если вам нужен многофункциональный инструмент или вас не устраивают результаты работы вышеперечисленных инструментов, попробуйте воспользоваться следующими бесплатными сервисами:

    • Word2CleanHTML – сохраняет большую часть формата исходного документа и создает HTML-страницу, годную для применения в веб-разработке.[1]
      Этот инструмент предлагает настроить параметры конвертирования, например, определить действия с нестандартными символами или пустыми абзацами.
    • Конвертер ZamZar.com позволяет преобразовывать документы как в формат HTML5, так и в устаревший формат HTML4 (он работает в большинстве браузеров и может быть более знаком некоторым пользователям). Для пользования этим инструментом потребуется ввести ваш адрес электронной почты.
  3. Изображение с названием Convert a Word Document to HTML Step 3

    3

    Google Диск. Этот сервис пригодится в том случае, если вы работаете над документом Word совместно с другими пользователями; преобразовав документ в формат HTML, вы можете пригласить ваших коллег посмотреть на результат.[2]

    • Войдите в Google Диск.
    • Нажмите красную кнопку Создать и выберите Документ.
    • Скопируйте текст вашего документа в пустой документ.
    • В меню Google Документы нажмите ФайлСкачать какВеб-страница.
  4. Изображение с названием Convert a Word Document to HTML Step 4

    4

    Если в формат HTML вы хотите преобразовать сотни документов, воспользуйтесь платным программным обеспечением, которое позволяет конвертировать множество файлов за один раз. Вот несколько таких программ:

    • WordCleaner
    • NCH Doxillion

    Реклама

  1. Изображение с названием Convert a Word Document to HTML Step 5

    1

    Откройте документ в Microsoft Word или в OpenOffice. Эти текстовые редакторы могут конвертировать документы в формат HTML, но при этом получаются файлы значительных размеров, а формат текста может не поддерживаться некоторыми браузерами.[3]
    Тем не менее, такой HTML файл легко преобразовать обратно в документ Word для его последующего редактирования.

  2. Изображение с названием Convert a Word Document to HTML Step 6

    2

    Нажмите кнопку «Office» (в левом верхнем углу экрана) или «Файл» (в старых версиях MS Office) и в меню выберите «Сохранить как».

  3. Изображение с названием Convert a Word Document to HTML Step 7

    3

    В выпадающем меню выберите «Веб-страница», чтобы сохранить документ в формате HTML.[4]

    • Если вы не можете найти этот параметр, измените расширение файла на .htm или .html, а имя файла заключите в кавычки: «ExampleFile.html».[5]
  4. Изображение с названием Convert a Word Document to HTML Step 8

    4

    В некоторых версиях Word вы можете сохранить документ в HTML-файле, очень похожем на исходный документ, но который будет загружаться быстрее (как веб-страница). Если вы не планируете конвертировать HTML-файл обратно в докумен Word, выберите «Веб-страница с фильтром».[6]

    • Если эта опция недоступна, сохраните документ в виде «обычной» веб-страницы, а затем воспользуйтесь онлайн инструментом AlgoTech’s Mess Cleaner для преобразования «обычной» веб-страницы в небольшой HTML-файл.

    Реклама

Советы

  • В Word нажмите ВидВеб-документ, чтобы предварительно просмотреть то, как будет выглядеть HTML-файл.

Реклама

Предупреждения

  • Во время преобразования в HTML-файл некоторые параметры форматирования и стиля текста документа Word будут утеряны. Чтобы поправить форматирование текста, воспользуйтесь CSS (это формальный язык описания внешнего вида документа).

Реклама

Об этой статье

Эту страницу просматривали 41 248 раз.

Была ли эта статья полезной?

Назад

Дата публикации: 14.04.2020

Просмотры: 1003

Бочка меда

СДО Collaborator умеет создавать учебные ресурсы из готового документа MS Word. Достаточно загрузить файл DOCX-формата, и система предложит преобразовать его в ресурс типа «Страница». Это существенно экономит время авторам ресурсов.
Порядок действий:

  1. Создать новый ресурс типа «Файл».
  2. Загрузить документ.
  3. После загрузки система спросит «Преобразовать файл в Ресурс-Страницу?» — соглашайтесь.
  4. Сохранить и просмотреть результат.

Результатом будет HTML-ресурс в виде одной страницы, которая быстро загружается и имеет все шансы идеально отображаться на любых устройствах — от десктопа до мобильного телефона.

СДО Collaborator умеет создавать учебные ресурсы из готового документа MS Word.

СДО Collaborator умеет создавать учебные ресурсы из готового документа MS Word.

Ложка дегтя

Но преобразование подразумевает, что при верстке документа DOCX вы пользовались общепринятыми правилами и форматированием, которому существует прямая альтернатива в формате HTML. Если прямого соответствия не будет, то и результат преобразования не сможет соответствовать оригиналу.
Что получится из вашего документа, вы можете легко проверить на сервисах преобразования Word в HTML. Например, этих:

  • Online HTML converter
  • Word to HTML Converter Online. Convert Word to clean HTML — 4html.net
  • WORD to HTML | DOC to HTML

Можно проверить, как сам MS Word справляется с преобразованием своего документа в формат HTML. Попробуйте команду «Файл» — «Сохранить как…» и выберите формат «Веб-страница с фильтром (.htm)».
Если результат будет адекватен оригиналу — то ваш документ отформатирован хорошо.
Абсолютного точного (или максимально близкого к нему) преобразования документа DOCX в другой формат можно добиться только преобразованием в PDF.
Но есть способы сделать документ так, чтобы он конвертировался в HTML без критических искажений.

Лучшие практики форматирования документа MS Word

1. Стили заголовков и текста

стандартные стили заголовков и текста
Используйте стандартные стили заголовков: «Заголовок», «Подзаголовок», «Заголовок 1», «Заголовок 2» и т.д.
Текст набирайте стандартным стилем — «Обычный».
blank

2. Интервалы и отступы

Если нужно для абзацев изменить отступы, поменять интервал, шрифт — изменяйте в тексте. Можно изменять стили, но это не всегда работает.
blank

3. Выравнивание текста

Обычно с этим проблем не бывает. Все преобразуется корректно.
blank

4. Форматирование шрифта

Все обычные способы изменения шрифта — начертание, цвет, размер — будут хорошо преобразовываться. blank

5. Изменение типа шрифта

Шрифт сможет отобразиться правильно только в том случае, если на чужом компьютере будут установлены те же шрифты, что и у вас. Используйте «безопасные шрифты» (узнать больше про безопасные шрифты):
blank

6. Списки

С простыми маркированными и нумерованными списками проблем не должно возникнуть. Иерархические списки тоже неплохо конвертируются, но будут отличия.
blank

7. Вставка картинок

Корректно будут отображаться картинки, вставленные в текст как знак, а не как отдельный объект с возможностью обтекания. На всякий случай проверьте, что настроено в параметрах картинки «Размер и положение» — «Обтекание текстом», должно быть так:
blank
Самый надежный способ — отдельная картинка в отдельном абзаце текста.
blank
Никакие настройки обтекания картинки текстом не дадут хорошего результата. Картинки будут «убегать» со своих мест и игнорировать ваши настройки.
blank
Единственный стабильный случай — это обтекание картинки текстом справа. Вставьте картинку первой в абзаце перед всем текстом и в настройках «Размер и положение» — «Обтекание текстом» выберите стиль обтекания «Квадрат» и обтекание текстом «Справа». Там же дополнительно можно настроить отступы от текста.
blank

8. Изменение размера, оформления и параметров картинок

Вы можете изменять размеры картинок и экспериментировать с другими преобразованиями.
Как правило, простые изменения корректно переносятся в HTML. Но что-то особенное — вряд ли будет распознано правильно.
blank

9. Таблицы

Таблицы конвертируются практически без ошибок.
blank
Если нужно, чтобы таблица растягивалась на всю ширину страницы — установите для нее параметр «Автоподбор по ширине окна». В этом случае лучше делать таблицу как можно проще — без объединённых и разбитых ячеек, преобразование может быть неточным.
blank

Итог

Волшебного средства, что точно преобразовывает любой DOCX в HTML, не существует. И не может быть. Слишком много специфических отличий у этих форматов.
Если вам важна скорость создания учебных ресурсов в СДО и их адаптивность к чтению с мобильных устройств — очищайте и упрощайте форматирование документов, конвертируйте в HTML. У вас будет быстрый и приемлемый результат.
Требуется точное соответствие документу? С этим лучше всего справляется преобразование в формат PDF. Но придется забыть про удобства чтения с экрана смартфона.
Есть время и вдохновение — изучайте верстку HTML документов с помощью правил Bootstrap его альтернатив Bootstrap Alternatives (Top 10 Best Frontend Frameworks).
P.S. У нас есть пример правильно подготовленного Word файла. Его можно брать как шаблон для своих документов. Пройдите на демо-портал LMS Collaborator в Каталог учебных заданий и заберите его по этой ссылке:

Просто взять и скачать.

P.S. Есть предложения, вопросы, замечания, идеи — пишите в комментарии.


Присоединяйтесь к нашему сообществу корпоративного обучения в Facebook и Linkedin, а также подписывайтесь на нашу e-learning рассылку, чтобы получить апрельскую скидку на использование LMS Collaborator по промокоду: LMS2020-04

blank

The LMS Collaborator Team

Content Manager LMS Collaborator

Всі записи автора

Download PC Repair Tool to quickly find & fix Windows errors automatically

Microsoft keeps updating its Office products to present a better experience for its customers. Of the many new features it has introduced, one is to transform a Microsoft Word document into a webpage. While this feature existed earlier as well, the new method is way easier.

How to convert and display Word document file in HTML web page template

You can convert and display a Word document file in an HTML web page template using one of the following methods.

1] How to transform a Word document to a webpage using Transform option

  • Once you have created your document, click on File on the top.
  • In the File menu, select Transform.
  • Immediately, a side window opens with the name transformed to the webpage.
  • From the available Style templates, choose your favorite one.
  • Now, click on Transform. If you are logged in to MS Office using your Microsoft account, it will use the same login. Else it will prompt for the same.
  • You will be directed to the Microsoft Sway online page.
  • You would have to log in on the browser as well if you are not already logged in there.

The webpage will appear. You can use it the way you please.

2] How to convert a Word document to HTML webpage using Save as option

You can also save your Microsoft Word document as an HTML webpage on your system. The procedure is as follows:

  • Once you are done with creating the document, go to File.
  • Now, select Save as.
  • Click on Browse.
  • Now in Save as type, choose Webpage and save the document to the appropriate location.

One major difference in these 2 methods of transforming an MS Word document into a webpage is that in the first case, you can choose the template and make appropriate modifications. Should you learn MS Sway, more customization is possible.

In the second case, which is also the more traditional method, you do not have the option of choosing the template. The document is saved as an HTML file.

Use of transforming a Microsoft Word document into a webpage

The best use of transforming a Microsoft Word document into a webpage is to be able to put your data in stored in the form of documents on your website. When you have a massive quantity of research, the best way to put it online is using the mentioned method.

How do I open the MS Word webpage?

When you save the MS Word document as a webpage, then it in saved with the extension .htm and the format is HTML. When you click on this HTML file, it will open with your default browser. The system need not be connected to the internet for the same.

However, if you need to add it to your website or use it practically, another system needs to be used. The new method to transform your document using Microsoft Sway is helpful.

What is Microsoft Sway?

Microsoft Sway is a software that converts text and videos to online presentations which in turn form web pages and could be used for creating websites.

Ezoic

Karan is a B.Tech, with several years of experience as an IT Analyst. He is a passionate Windows user who loves troubleshooting problems and writing about Microsoft technologies.

Word to HTML

phpdocx Advanced and Premium licenses include the functionality of transforming DOCX files to HTML with native PHP classes.

There are currently two ways to transform Word to HTML with phpdocx:

  • With the conversion plugin
  • With the TransformDocAdvHTML native PHP class

The conversion plugin executes LibreOffice or OpenOffice to perform the conversion. This method has a disadvantage: it is not native PHP and requires calling external programs, besides, it doesn’t allow to customize the output but with PHP DOM modifications after the conversion.

Native PHP classes included in Advanced and Premium licenses allow to transform DOCX to HTML with PHP exclusively. The main features of this functionality are the following:

  • Conversion of contents, styles and properties
  • Native PHP classes
  • Easily customizable
  • Transform DOCX created from scratch and templates

The transformation can be done using just three lines of code:

where document.docx can be a DOCX created with phpdocx or from other source (MS Word, LibreOffice, etc). Premium licenses can also transform in-memory documents.


Supported OOXML tags and attributes

phpdocx parses contents, styles, properties and other XML contents.

The list of currently parsed contents and styles include (OOXML content/style and HTML/CSS transformation):

  • document (w:body) : <body>

    • background color (w:background) => w:color (background-color)
    • background image (v:background) => id (background-image)
    • border (w:pgBorders) => w:top (border-top), w:bottom (border-bottom), w:left (border-left), w:right (border-right): w:color (border-color: #HEX), w:sz (border-width), w:val (border-style: nil, none, dashed, dotted, double, solid), w:space (padding)
  • sections (w:sectPr) : <section>

    • size (w:pgSz) => w:w (max-width)
    • margin (w:pgMar) => w:top (margin-top), w:bottom (margin-bottom), w:left (margin-left), w:right (margin-right)
    • columns (w:cols) => w:num (columns)
  • title and metas (cp:coreProperties) : <title>, <meta>

    • title (dc:title) => <title>
    • author (dc:creator) => <meta> (author)
    • description (dc:description) => <meta> (description)
    • keywords (cp:keywords) => <meta> (keywords)
  • text strings (w:t) and text styles (w:rPr) : <span>

    • text (w:t) => <span>
    • bold (w:b) => w:val (font-weight: bold)
    • color (w:color) => w:val (color: #HEX)
    • double line through (w:dstrike) => w:val (text-decoration-style: double)
    • font family (w:rFonts) => w:ascii (font-family), w:cs (font-family)
    • font size (w:sz) => w:val (font-size)
    • highlight (w:highlight) => w:val (background-color)
    • italic (w:i) => w:val (font-style: italic)
    • line through (w:strike) => w:on (text-decoration: line-through)
    • lower case (w:smallCaps) => w:val (text-transform: uppercase; font-size: small)
    • text decoration (w:u) => w:val (text-decoration: none or underline; text-decoration-style: dashed, dotted, double, solid, wavy, none)
    • upper case (w:caps) => w:val (text-transform: uppercase)
    • vanish (w:vanish) => w:val (visibility: hidden; visibility: visibility)
    • vertical align (w:vertAlign) => w:val (vertical-align: sub; vertical-align: super)
  • paragraphs (w:pPr) : <p>

    • background color (w:shd) => w:shd (background-color)
    • bold (w:b) => w:val (font-weight: bold)
    • border (w:pBdr) => w:top (border-top), w:bottom (border-bottom), w:left (border-left), w:right (border-right), w:color (border-color: #HEX), w:sz (border-width), w:val (border-style: nil, none, dashed, dotted, double, solid), w:space (padding)
    • color (w:color) => w:val (color: #HEX)
    • double line-through (w:dstrike) => w:val (text-decoration-style: double)
    • font family (w:rFonts) => w:ascii (font-family)
    • font size (w:sz) => w:val (font-size)
    • heading (w:outlineLvl) => w:val (h1, h2, h3, h4, h5, h6)
    • highlight (w:highlight) => w:val (background-color)
    • italic (w:i) => w:val (font-style: italic)
    • line height (w:spacing) => w:line (line-height)
    • line through (w:strike) => w:on (text-decoration: line-through)
    • lower case (w:smallCaps) => w:val (text-transform: lowercase)
    • margin (w:ind, w:spacing) => w:left (margin-left), w:start (margin-left), w:right (margin-right), w:end (margin-right), w:after (margin-bottom), w:before (margin-top)
    • padding (w:hanging) => w:hanging (padding-left, text-indent)
    • page break (w:pageBreakBefore) => w:val (page-break-before: always)
    • text align (w:jc) => w:val (text-align: left, justify, center, right)
    • text decoration (w:u) => w:val (text-decoration: none or underline; text-decoration-style: dashed, dotted, double, solid, wavy, none)
    • text indent (w:firstLine) => w:firstLine (text-indent)
    • text direction (w:textDirection) => w:val tbRl (direction: rtl; text-align: right;)
    • upper case (w:caps) => w:val (text-transform: uppercase)
    • vertical-align (w:vertAlign) => w:val (vertical-align: sub; vertical-align: super)
    • word wrap (w:wordWrap) => w:val (word-wrap: break-word)
  • lists (w:numPr) : <ul>, <ol>, <li>

    • type (w:numId) => w:val and w:ilvl (list-style-type: circle, disc, decimal, lower-alpha, lower-roman, upper-alpha, upper-roman)
    • view paragraphs elements for other styles
    • some styles such as color or font sizes can be inherited to the li content from the li symbol. In this case, the content must have its own style
  • links : <a>

    • bookmark (w:bookmarkStart, w:bookmarkEnd) => w:name (<a>)
    • cross-reference (w:instrText) => PAGEREF (<a>)
    • link (w:instrText) => HYPERLINK (<a>)
  • form elements

    • checkbox (w:instrText) => (<input> checkbox)
    • date (w:date) => (<input> date)
    • input (w:instrText) => (<input> text)
    • select (w:instrText, w:comboBox) => (<select>)
  • styles (view elements on this same page for supported styles)

    • character/run (w:rPr)
    • paragraph (w:pPr)
    • list (w:pPr, w:numId, w:ilvl)
    • table (w:style, w:pPr, w:rPr)
    • styles file (w:styles) => character/run (w:rStyle), paragraph and list (w:pStyle), table
    • numbering file => list (w:abstractNum)
    • default styles (w:docDefaults, w:style w:default=»1″) => w:pPr, w:rPr
  • tables (w:tbl) : <table>

    • align (w:jc) => w:val (margin-left, margin-right)
    • border (w:tblBorders) => w:top, w:right, w:bottom, w:left (border-: width style [dashed, dotted, double, none, solid] color)
    • layout (w:tblLayout) => w:type fixed (table-layout)
    • margin (w:tblInd, w:tblpPr) => w:w (margin-left), w:bottomFromText (margin-bottom), w:topFromText (margin-top)
    • width (w:tblW) => w:type pct, dxa w:w (width)
    • first col style (w:tblStylePr) => w:type (w:rPr styles)
    • first row style (w:tblStylePr) => w:type (w:rPr and w:pPr styles)
    • last col style (w:tblStylePr) => w:type (w:rPr styles)
    • last row style (w:tblStylePr) => w:type (w:rPr and w:pPr styles)
    • band1Horz style (w:tblStylePr) => w:type (w:rPr and w:pPr styles)
    • band2Horz style (w:tblStylePr) => w:type (w:rPr and w:pPr styles)
    • row height (w:trPr) => w:trHeight (height)
    • rowspan (w:vMerge) => w:val restart, continue (rowspan)
    • cell background color (w:shd) => w:fill (background-color)
    • cell border (w:tcPr) => w:top, w:right, w:bottom, w:left (border-: width style [dashed, dotted, double, none, solid] color)
    • cell padding (w:tblCellMar) => w:top (padding-top), w:right (padding-right), w:bottom (padding-bottom), w:left (padding-left)
    • cell vertical align (w:vAlign) => top, bottom, center, both and default w:val (vertical-align)
    • cell width (w:tcW) => w:w (width)
    • colspan (w:gridSpan) => w:val (colspan)
    • text direction (w:textDirection) => w:val btLr, tbLrV, tbRl and tbRlV (writing-mode, transform, white-space)
  • images (w:drawing) : <img>

    • Supported image formats: png, jpg and other formats supported by web browsers. Wmf is supported if ImageImagick is installed
    • border (a:ln, a:noFill) => w (width), a:prstDash (style: dashed, dotted, solid), a:srgbClr (color)
    • float (wp:positionH, wp:align) => right (float: right), left (float: left), center (display:block; margin-left: auto; margin-right: auto)
    • height (wp:extent) => cy (height)
    • link (a:hlinkClick) => r:id (href)
    • margin (wp:effectExtent, wp:positionH, wp:positionV) => t (margin-top), r (margin-right), b (margin-bottom), l (margin-left), wp:positionH wp:posOffset (margin-left), wp:positionV wp:posOffset (margin-top)
    • text wrapping (wp:inline, wp:anchor) => wp:inline (display: inline), wp:wrapSquare (float: left), wp:wrapNone behindDoc (position: absolute; z-index: -1)
    • width (wp:extent) => cx (width)
    • src (r:embed, r:link) => embedded and linked images
    • saved as files or as base64 (only for embedded images)
  • charts (w:drawing) : <div>

    • Supported charts: bar (group, stack and percent), column (group, stack and percent), pie, doughnut and line charts
    • Plotly JS library (MIT license) [https://plotly.com/javascript/] is used as default chart library
    • height (cy)
    • labels (c:cat)
    • legends (c:tx)
    • orientation (h, v)
    • values (c:val)
    • width (cx)
    • Plotly default colors are used
  • other elements

    • break (w:br) => (<br>)
    • comment (w:commentReference, w:comment) => added to the bottom of the page (<span>)
    • date (w:instrText) => TIME (<span>)
    • endnote (w:endnoteReference, w:endnote) => added to the bottom of the page (<span>)
    • external file (w:altChunk) => r:id (<a>)
    • footer (w:footerReference, w:ftr) => (<footer>) added to the bottom of its section
    • header (w:headerReference, w:hdr) => (<header>) added to the top of its section
    • footnote (w:footnoteReference) => added to the bottom of the page (<span>)
    • math equations => Office MathML
    • simple fields (w:fldSimple) => AUTHOR, COMMENTS, LASTSAVEDBY, TITLE
    • tabs (w:tab) => (<span>) margin-left default
    • textbox (v:textbox) => (<div>), style (min-height, float, width), fillcolor (background-color), margin-top (margin-top), strokecolor (border-color, border-style), strokeweight (border-width)
    • tracked contents (w:ins, w:del) => (<ins>, <del>)

    WARNING:

  • The fact that a tag is not parsed does not mean its content disappears from the HTML output. It only implies that their associated OOXML properties are not taken directly into account. Their children and text content will be parsed and rendered with their corresponding styles into the HTML output.

The transforming features included in phpdocx allow to transform complex DOCX documents generated from scratch or using templates. Let’s take a look at some samples and their HTML output.

DOCX with an A4 section and paragraphs:

DOCX with tables:

DOCX with lists and text styles:

DOCX with headers and footers:

DOCX from a template:

DOCX with charts:


How to customize transformations

Nearly all the functionalities available for performing DOCX to HTML transformations can be customized.

The two main classes for transformations are: TransformDocAdvHTML and TransformDocAdvHTMLPlugin.

TransformDocAdvHTML is the class for parsing DOCX structures and performs the transformation to HTML. Its constructor receives an object of the TransformDocAdvHTMLPlugin type that sets the export options. This class can be extended to customize the transformation of each element, e.g., transformW_BOOKMARKSTART for bookmarks or transformW_SECTPR for sections.

TransformDocAdvHTMLPlugin allows to generate transformation plugins according to the project requirements. E.g.: inserting images as base64, ignoring sections, customizing conversion factors, setting the method to set export sizes and set CSS, JavaScript and custom HTML. phpdocx includes the TransformDocAdvHTMLDefaultPlugin, the default plugin to perform transformations.

All the available options are thoroughly explained in the API documentation page of the transformDocAdvHTML method.

Many people would agree that Microsoft Word is a versatile program. However, it may not be the best conversion tool for HTML. One reason is that Microsoft adds extra code that allows you to easily switch document formats. The result is large files and code that may cause rendering issues. In this tutorial, I will show some free and paid Word to HTML converters.

Like many people, I use Microsoft Word to write. I sometimes write my articles in Word and convert them into my content management system, which is WordPress. It can be a trade-off between convenience, functionality, and file size. There are lots of online tools that can do this file conversion, but they don’t always deliver clean HTML.

Document Complexity & Needs

When I first wrote this article in 2008, the conversion options were much different. For example, Gmail had an option to view an attachment as HTML. There were also a couple of commercial vendors who have since disappeared. I’ve updated the article to reflect those changes. The main considerations I see are:

  • How complicated is your Word document?
  • Does your Word document have images?
  • Does your HTML document need to have the same formatting?
  • Are you comfortable uploading a doc file to a 3rd party service?
  • Where will the HTML document be seen?
  • Will the document be viewed on a mobile phone?
  • Are you willing to pay for conversion?
  • How often do you need to convert Word documents?

Regardless of complexity, there are common issues I see. None of the systems below are perfect, so you’ll probably have to do some tweaking. The issues I encountered include:

  • Most conversion systems will mark your Title as a paragraph.
  • Apostrophes and special characters may be displayed on black backgrounds.
  • Embedded images may not be shown or be placed in a new folder as separate image files

Microsoft Word Options

One logical place to start would be with Microsoft and to see if Word can format the file as HTML. While Word is not an HTML editor like VS Code, it can save your files in different file formats. There are three options, but before you convert your document, make sure you’ve saved the original in the .docx format. This will allow you to test the various options without overwriting your content.

Word – Save As Web Page

You might think the best and most convenient way to get your Word document to HTML is to use the Save as type: Web Page. Then you could upload the saved HTML file to a web server. However, there are two issues you should review with this file type.

This Web Page format appends information from the File Properties dialog and the document template. These data elements include author, last author, company, document stats, and so on. You can see some of these elements in the image below. As you might guess, a lot of this relates to style information from your Microsoft Word template or normal.dot.

Converted Word HTML file.

Extra info Word adds to source code

The Web File version is probably fine for company intranets, where users aren’t as concerned about privacy. Some of this information could be seen if you emailed the Word file to a co-worker. In contrast, I wouldn’t use this format to post your resume on the web, especially if you wrote it using a company PC that shows the organization’s name. These files contain too many proprietary tags.

The second issue is that this HTML format adds tags to the file. One function of these tags is to convert your Microsoft Word style information. These tags also make it easier to go from one file type version to another. For example, if you wanted to go from .HTML to .RTF or back to .DOCX. However, the <body> tag doesn’t even show up until line 1093.

Word adds extra tag information to web page.

HTML file with extra info shown in VS Code

This extra code increases the size of your web page. This may not sound like an issue, but it can be based on your document size. And the extra code may cause rendering issues on some devices.

Another drawback of this extra code is if you need to edit the HTML file. Most HTML documents have a separate CSS file that controls styling. With converted documents, this styling is done inline. However, based on how the initial Microsoft Word document was styled, you might have to change every paragraph or span. With a CSS file, you’d probably make one change.

Lastly, the program will create another folder that contains supporting files such as your images.

Extra folder Word creates for HTML file conversion.

Word – Save As Filtered Web Page

Microsoft has another HTML file format called the Web Page, Filtered. This file type strips most of the document information and focuses on the content. It also cuts the number of document and template style codes. Although considerably smaller than “save as web page”, this file format still contains numerous classes and span references.

The size was cut from 92K to 31K with this format on my test page. Much of the savings in this example came from the removal of the document information. For example, in my first file, the heading tag for Example 1 was on line 1093. In the Web Page, filtered format, the same heading is at line 128.

Converted Word doc using Web Page, filtered

The bottom line is the Microsoft Word conversion options are free and offer you convenience. The downside is you may reveal too much info in your document, and future HTML editing may be tougher with all the extra info. And if you decide to use the Save as Filtered Web Page, you will lose some Microsoft Office formatting features if you re-open the file in Word.

Content Management Systems (CMS)

Many content management systems promise that it’s easy to create content using WYSIWYG. Ideally, you write your article in their HTML editor. I’ve yet to find an editor that gives me the functionality or space I need, which is why I sometimes write in Microsoft Word.

Some CMS editors provide a Paste from Word toolbar button.

Paste Word toolbar button.

Example of a CMS with Paste Word tool

These utilities will remove some tags, but not all. Depending on your document, there might be lots of these tags. Some systems also offer a Paste as Text button. This button will remove all text formatting. This button works well for simple documents, but if you have any formatting for lists, tables, paragraphs, and so on, you might spend more time reapplying the formatting.

WordPress Gutenberg Block Editor

If you haven’t tried the Gutenberg block editor in WordPress, you should. I was pleasantly surprised to see how much styling it retained. The screen snap below shows a recent paste and created blocks. If I go to view the HTML code of the second block, I can see it properly coded the unordered list.

Pasted Word content into Gutenberg editor

The process isn’t foolproof, but if you’re already using WordPress, it’s probably the best option.

Security Concerns and Risks

Anytime you use another software or service to convert your files, please read their Terms of Service (TOS) first. There are many tools that can assist you; however, some tools add code to your page. This could be something like indicating the file was converted by XYZ service. In many cases, that’s the “price” you pay for using a free service. I think it’s fine for the vendor to note that.

Sadly, other vendors inject non-relevant links that amount to SEO spam. This is something I have seen with WordPress plugins, but not with Word to HTML conversion or HTML cleaning services. However, SaaS bootstrapper has a nice blog post detailing an issue he discovered with link injection. It’s worth a read, and I applaud his desire to find out what was going on.

The good news is that none of the services referenced below were mentioned in the blog post. The author does provide a list of bad actors.

Paste & Convert Solutions

Another way to convert Word documents is to use an online service. These are free services that best handle simple text. One advantage is these services are not saving your files, and no uploads are required. The major drawback is that images are typically ignored.

Word2CleanHTML

Word2CleanHTML provides a free conversion service. You can copy and paste your Word document into a textbox. The advantage is they provide several checkboxes for additional filtering, such as removing blank paragraph tags or converting “smart quotes.”

They also provide tabs so you can compare the “Original HTML,” “Clean HTML,” and “Preview.” These tabs were useful as they helped me spot a blank table in my original document. The main drawback is they don’t handle images, but as I said above, those are easier to add back if you don’t have too many and know a bit of HTML.

TextFixer

One of the simplest converters to use is TextFixer. You copy the Word document and then paste the contents into a textbox. The service does a reasonable job of providing HTML but strips out any images. Unfortunately, if you used fancy list bullets or windings, you might see a different character.

This site also has a number of other free text tools and tutorials that might interest you, such as converting HTML to text or text sorting.

Upload File and Convert Solutions

Another group of services allows you to upload the Word .doc or .docx file for conversion. In some cases, the service did other file type conversions aside from Microsoft Word. Typically, these services did a better job of conversion as they would retain images. However, some people don’t like uploading files to another site, especially if it’s confidential information that might go on a corporate intranet.

Online-Convert.com

Online-Convert.Com is a service that converts many file types, as the image below shows. It’s similar to a service I reviewed sometime back called Zamzar. The process is simple in that you choose the end file type you need, such as HTML. You then upload your Word source document. One advantage is that you can upload files from a URL, Google Drive or DropBox.

Drag and drop interface for converting docs.

Convert different file forms to HTML

Your file will be converted, including any images. The images are converted to PNGs and are included in the zipped file. The resulting file contains more inline styling than the previous solutions, but not nearly as much as Microsoft Word or Google Docs. You may also want to check to see if further image compression needs to be done. There are some additional META tags added in the section. If you prefer, you can also opt to get a download link for the converted file sent to an email address.

Word to HTML

This service is a versatile Word to HTML converter. It has plenty of features for the free version, but it also has a paid or Pro component that adds more. Unlike sites that do lots of file conversions, WordtoHTML specializes in HTML. They also have the most control over CSS and even javascript. The free version doesn’t allow file uploads or embedding images. However, you can provide a URL for images, and they will add the image tags, size, and description.

Free version of Word to HTML

You’ll also need to copy the code from the HTML editor as the free version doesn’t allow file downloads.

The WordtoHTML Pro version costs $90 per year or $10 per month and includes additional features. If you routinely convert Microsoft Word files or PDF files to HTML, this is a good option because you can customize and save your settings in template files. For example, you may want to include head or meta-information or prefer a certain type of formatting.

This is one of the few services I’ve seen that has PDF to HTML conversions. It worked pretty well, but like complex Word documents, you may see issues. For example, drop caps don’t always come in. And if people have tweaked the letter kerning to get words to fit, you may see errors. The Find and Replace option is also handy if you need to replace character entities.

Complex Documents and Batch Conversions

The other situation some companies run into is how to convert hundreds of Word documents to HTML files. I doubt anyone would want to do these file conversions one document at a time. Instead, they could use another commercial program called DocConverterPro. The program allows you to batch convert .doc, .docx, .rtf and PDF files to HTML or XHTML. This product is the new and improved version of the WordCleaner program I used back in 2008. It’s been rebranded.

DocConverterPro does more than batch HTML conversion. I used this software years ago to convert my Word documents for this site when I was using Joomla. At the time, I wasn’t familiar enough with HTML. One of the nice features is you can create conversion templates. For example, I might have one template for content on this website where I use CSS files to handle the presentation. On another site, I might choose to embed the style information in the file. It’s very flexible and powerful.

DocConverterPro template

You can edit conversion templates

The service also has a Microsoft Windows program. That was the method I used many years ago, but I think the online version is easier. Pricing does vary between the services and options. For example, the online version is $100 yearly or $10 per month.

Clearly, there are a lot of options when it comes to converting a Word file to HTML. A lot of it comes down to time, budget and document structure. If you have a lot of documents, I’d lean toward some of the paid solutions. These tools will reduce the overall conversion time and give you a more consistent result.

Hand-picked Tutorials

  • How to Compress a Word Document
  • How to Wrap Text Around Images
  • Spell Check Not Working in Word
  • How to Add HTML Signature to Outlook
  • How to Make a Letterhead Template in Word (Video)

Понравилась статья? Поделить с друзьями:
  • Word to html pages
  • Word to html linux
  • Word to html free
  • Word to help you remember something
  • Word to have integrity