Javascript word to pdf

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

Reading time: 10 minutes | Coding time: 15 minutes

In this article you will learn how how to create PDF files out of your word document.Well up until recently, creating complex or elegant PDFs in Javascript has been challenging.Here I’m going to show you step-by-step the path of to create beautiful PDFs.

unnamed

Before we dive further into the process why don’t we learn what is word document or what is PDFs or what is the need to convert word document to PDFs.
Word Document is a popular word-processing program used primarily for creating documents such as letters, brochures, learning activities, tests, quizzes and students’ homework assignments.DOC stands for DOCument file. A DOC file can contain formatted text, images, tables, graphs, charts, page formatting, and print settings.

The Portable Document Format (PDF)is a file format developed by Adobe in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems
Yes, PDFs were created to avoid any changes that might occur due change in hardware or software which might have occurred to you if you had ever used Word document.Let’s take a example,Whenever a make a word document and to a printing shop for the print the shop owner may have different OS,hardware or even software and when you get your printed document it is not what you saved at your computer maybe there is a large gap between line format is distorted or something else.So,that’s where PDF come’s to rescue.

Word to PDF approaches

There are three methods that I am going to discuss today which are very easy to use and produce excellent results. These are using:

  • awesome-unoconv
  • libreoffice-convert
  • docx-pdf

1. awesome-unoconv

awesome-unoconv is nodeJS wrapper for converting Office files to PDF or HTML

REQUIREMENT

Unoconv is required, which requires LibreOffice (or OpenOffice)
You can install unoconv in linux operating system by using following command

sudo apt-get install unoconv

INSTALLATION

npm install awesome-unoconv

CODE

const path = require('path');
const unoconv = require('awesome-unoconv');
//Place your word file in source
const sourceFilePath = path.resolve('./word_file.docx');
const outputFilePath = path.resolve('./myDoc.pdf');
 
unoconv
  .convert(sourceFilePath, outputFilePath)
  .then(result => {
    console.log(result); // return outputFilePath
  })
  .catch(err => {
    console.log(err);
  });

Now go to the terminal and run your command pdf will be created in current working directory with name «myDoc.pdf» (you choose any name you like).
This is one of the method to convert word document to pdf let’s keep going.

2. libreoffice-convert

A simple and fast node.js module for converting office documents to different formats.

DEPENDENCY

Since,I am using linux please Install libreoffice in /Applications (Mac), with your favorite package manager (Linux), or with the msi (Windows).

INSTALLATION

npm install libreoffice-convert

CODE

const libre = require('libreoffice-convert');
 
const path = require('path');
const fs = require('fs');
 
const extend = '.pdf'
const FilePath = path.join(__dirname, './word_file.docx');
const outputPath = path.join(__dirname, `./example${extend}`);
 
// Read file
const enterPath = fs.readFileSync(FilePath);
// Convert it to pdf format with undefined filter (see Libreoffice doc about filter)
libre.convert(enterPath, extend, undefined, (err, done) => {
    if (err) {
      console.log(`Error converting file: ${err}`);
    }
    
    // Here in done you have pdf file which you can save or transfer in another stream
    fs.writeFileSync(outputPath, done);
});

Since,This is only for libre-office you might not find it very usefull if you are using windows but for for linux/Mac operating system it is very popular.

Let’s look at the last and third method to convert word document to PDFs

3. docx-pdf

It is a library that converts docx file to pdf and it is one of most optimal and quality wise best among the three and most easiest one also.

INSTALLATION

npm install docx-pdf

CODE

var docxConverter = require('docx-pdf');

docxConverter('./word_file.docx','./output.pdf',function(err,result){
  if(err){
    console.log(err);
  }
  console.log('result'+result);
});

Output should be output.pdf which will be produced on the output path your provided.

Question

Which one of the following is the correct command to install any module/package in your project?

npm install package-name

node install package-name

install package-name

nodejs install package-name

npm install

is the correct syntax,Example:
If you want to install express module you can do it by writing the command
> npm install express

If you want to see the project you can use this link to my github page.

In this step-by-step tutorial, we’ll walk you through how to generate an Office Word or DOCX document programmatically and then either download it or display it in Apryse WebViewer using the JavaScript PDF library.

Everything to generate DOCX and save as a PDF is done in client-side JavaScript without any MS Office or server-side dependencies. You can do this in a vanilla JS app, React, or any other framework of your choice. The same functionality is also available if you are building a Node.js app.

For this walkthrough, we’ll use a React app. If you want to skip the steps and just look at the code, you can check out the repo on our GitHub.

Now let’s get started. First, generate a React app by running:

npx create-react-app

Generate DOCX Document Programmatically

We’ll use the popular DOCX library by Dolan. This library allows us to generate DOCX documents programmatically, directly in the browser or in a Node.js environment.

npm i docx

Inside of App.js, let’s create a new function to generate our document.

import './App.css';
import { Document, Packer, Paragraph, TextRun } from 'docx';

function App() {
 // generate DOCX document
 const generateDocx = async () => {
   const doc = new Document({
     sections: [
       {
         properties: {},
         children: [
           new Paragraph({
             children: [
               new TextRun(`DOCX lib and PDFTron's WebViewer is cutting-edge`),
             ],
           }),
         ],
       },
     ],
   });

   const blob = await Packer.toBlob(doc);

   return blob;
 };

 return <div className='App'></div>;
}

export default App;

So far, we have a Word document with a text paragraph that says `DOCX lib and PDFTron’s WebViewer is cutting-edge’. At this stage, we can go ahead and download the DOCX document; however, let’s first add some tables, populate those with data, and then display the document back to the user.

Display Generated DOCX to the User

First, we’ll need to add a viewing component to display our DOCX documents. Apryse WebViewer component can also be used to convert from DOCX to PDF, client-side, without any Office dependencies.

Create a new component in src/component/Viewer.js:

import React, { useRef, useEffect, useContext } from 'react';
import WebViewer from '@pdftron/webviewer';
import WebViewerContext from '../context/webviewer.js';

const Viewer = () => {
 const viewer = useRef(null);
 const { setInstance } = useContext(WebViewerContext);

 // if using a class, equivalent of componentDidMount
 useEffect(() => {
   WebViewer(
     {
       path: '/webviewer/lib',
     },
     viewer.current
   ).then((instance) => {
     setInstance(instance);
   });
 }, []);

 return <div className='webviewer' ref={viewer} style={{height: "100vh"}}></div>;
};

export default Viewer;

We’ll use context to recycle the WebViewer instance. Create a new file in src/context/webviewer.js:

import React from 'react';

const WebViewerContext = React.createContext({});

export default WebViewerContext;

Then, update our App.js:

import './App.css';
import { useEffect, useState } from 'react';
import { Document, Packer, Paragraph, TextRun } from 'docx';
import Viewer from './components/Viewer';
import WebViewerContext from './context/webviewer.js';

function App() {
 const [instance, setInstance] = useState();

 // generate DOCX document
 const generateDocx = async () => {
   const doc = new Document({
     sections: [
       {
         properties: {},
         children: [
           new Paragraph({
             children: [
               new TextRun(`DOCX lib and PDFTron's WebViewer is awesome!`),
             ],
           }),
         ],
       },
     ],
   });

   const blob = await Packer.toBlob(doc);

   return blob;
 };

 useEffect(() => {
   const generateAndLoadDocument = async () => {
     const docBlob = await generateDocx();
     await instance.Core.documentViewer.loadDocument(docBlob, {
       extension: 'docx',
     });
   };
   if (instance) {
     generateAndLoadDocument();
   }
 }, [instance]);

 return (
   <WebViewerContext.Provider value={{ instance, setInstance }}>
     <div className='App'>
       <Viewer />
     </div>
   </WebViewerContext.Provider>
 );
}

export default App;

At this stage, we’ve generated and loaded our Word document in WebViewer. From here, users can annotate, sign, highlight, and comment on the document.

Convert DOCX to PDF in the Browser

If we wanted to simply convert and download a PDF instead, and the UI is not needed, here’s how WebViewer can be leveraged to create a blob:

  // saves the document with annotations in it
  const doc = documentViewer.getDocument();
  const xfdfString = await annotationManager.exportAnnotations();
  const data = await doc.getFileData({xfdfString});
  const arr = new Uint8Array(data);
  const blob = new Blob([arr], { type: 'application/pdf' });

You can also load the library without initializing WebViewer instance. Here is a guide that demonstrates how to get Core.

If you’d like to do it server-side, you can explore the Node.js guide.

Wrap Up

We hope you found this article helpful! If you have any questions or feedback, feel free to email me directly.

It appears that even after three years ncohen had not found an answer. It was also unclear if it had to be a free (as in dollars) solution.

The original requirements were:

using client side resources only and no plugins

Do you mean you don’t want server side conversion? Right, I would like my app to be totally autonomous.

Since all the other answers/comments only offered server side component solutions, which the author clearly stated was not what they wanted, here is a proposed answer.

The company I work for has had this solution for a few years now, that can convert DOCX (not odt yet) files to PDF completely in the browser, with no server side component required. This currently uses either asm.js/PNaCl/WASM depending on the exact browser being used.

https://www.pdftron.com/samples/web/samples/viewing/viewing/

Open an office file using the demo above, and you will see no server communication. Everything is done client side. This demo works on mobile browsers also.

Convert word document to pdf in Nodejs Example

In this post, You will learn how to convert Docx files to pdf documents in JavaScript and nodejs.

Docx/doc are document file formats from Microsoft, that contains images, text, tables, and styles
PDF files are from Adobe company, which is a separate format for representing the content of images, texts, and styles

There are a lot of online tools to do the conversion from doc to pdf. Sometimes, As a programmer, you need to have a conversion of different formats in the JavaScript/NodeJS applications.

JavaScript/NodeJS offers multiple ways to convert using npm packages

  • docx-to-pdf
  • libreoffice-convert

You can also check other posts on npm command deprecate option is deprecated

How to Convert word document to pdf in Nodejs application

First, Create a Nodejs application from scratch.

Let’s create a nodejs application from scratch using the npm init -y command in a new folder

B:blogjsworknodeworkdoctopdf>npm init -y
Wrote to B:blogjsworknodeworkdoctopdfpackage.json:

{
  "name": "doctopdf",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo "Error: no test specified" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC"
}

This creates a package.json as follows

{
  "name": "doctopdf",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo "Error: no test specified" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC"
}

convert docx to pdf using docx-pdf library

docx-pdf is a simple library to convert Docx to a pdf document.

First, Install Docx-pdf npm library

npm install Docx-pdf --save

This will add a dependency in package.json as follows

{
 "dependencies": {
    "docx-pdf": "0.0.1"
  }
}

In javascript, import docx-pdf using the required for ES5 modules

var converter = require('docx-pdf');

convert objects accepts

  • input file which is a word document
  • output file is the name of a pdf document
  • callback which has err for error messages for conversion failed
    and result in successful conversion
  • a result is an object containing a filename attribute and the value is the pdf document name and path
var converter = require('docx-pdf');

converter('test.docx', 'output.pdf', function(err, result) {
    if (err) {
        console.log("Converting Doc to PDF  failed", err);
    }
    console.log("Converting Doc to PDF succesfull", result);
});

And same code was written with async and await keywords for the asynchronous process.

convert docx to pdf with async/await

This will be useful for bigger files of sizes

The declared function which accepts input and output filename
It returns the promise object with reject for failed conversions and resolve for successful conversions.
And, docxConverter logic calls inside the async keyword with an anonymous function for asynchronous processing.

    async function ConvertDocToPdf(inputfile, outputfile) {
            return new Promise((resolve, reject) =>{
                        const inputPath = path.join(__dirname, "test.docx");
        const outputPath = path.join(__dirname, `/test.pdf`);
        let docData = await fs.readFile(inputPath)
                docxConverter(inputfile, outputfile, (err, result) => {
                    return err ?
                        reject(err) :
                        resolve(result)
                })
            })
        }

You need to call the same function with the await keyword

    await ConvertDocToPdf("test.docx", "test.pdf")

It is a simple library, the only disadvantage is not able to convert formatting styles.

libreoffice-convert npm package

libreoffice is an open-source office package for managing office documents.

libreoffice-convert is an npm package in nodejs that provides manipulation of word documents.

First, install libreoffice-convert npm package using the npm install command

npm install libreoffice-convert --save

Example code to convert docx to pdf using the libreoffice-convert package:

const libre = require('libreoffice-convert');
const path = require('path');
const fs = require('fs');
async function ConvertDocToPdf() {
    try {
        const inputPath = path.join(__dirname, "test.docx");
        const outputPath = path.join(__dirname, `/test.pdf`);
        let docData = await fs.readFile(inputPath)
        return new Promise((resolve, reject) => {
            libre.convert(docData, '.pdf', undefined, (err, done) => {
                if (err) {
                    reject('Conversion Failed')
                }
                fs.writeFileSync(outputPath, done);
                resolve("Convertion successfull")
            });
        })
    } catch (err) {
        console.log("Error in input reading", err);
    }
}

a sequence of steps for the above code

  • Defined function with async keyword for asynchronous processing
  • import libreoffice-convert, fs, and path modules into code
  • read the input file using readFile method of fs module in NodeJS
  • libre.convert the docx to pdf file
  • conversion code is wrapped in a promise object
  • for conversion failed cases, the reject promise is returned
  • the promise is resolved for a successful conversion,
  • Finally written output pdf file using the writeFileSync method

Conclusion

To Sum up, Learned how to Convert word to pdf in nodejs in multiple ways.


This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters

Show hidden characters

const doc = new jsPDF();
// Load the Word file as a Blob
const file = await fetch(‘path/to/word.docx’).then(res => res.blob());
// Convert the Word file to a base64-encoded data URI
const reader = new FileReader();
reader.readAsDataURL(file);
reader.onload = () => {
const base64 = reader.result;
// Create a new Blob from the data URI
const dataURI = base64.split(‘,’)[1];
const byteString = atob(dataURI);
const arrayBuffer = new ArrayBuffer(byteString.length);
const int8Array = new Uint8Array(arrayBuffer);
for (let i = 0; i < byteString.length; i++) {
int8Array[i] = byteString.charCodeAt(i);
}
const wordFile = new Blob([arrayBuffer], { type: ‘application/msword’ });
// Use PDFJS to render the Word file as a PDF
PDFJS.getDocument(wordFile).then(pdf => {
// Add each page of the PDF to the jsPDF document
for (let i = 1; i <= pdf.numPages; i++) {
pdf.getPage(i).then(page => {
const viewport = page.getViewport({ scale: 1 });
const canvas = document.createElement(‘canvas’);
const context = canvas.getContext(‘2d’);
canvas.height = viewport.height;
canvas.width = viewport.width;
page.render({ canvasContext: context, viewport: viewport }).then(() => {
doc.addImage(canvas.toDataURL(‘image/png’), ‘png’, 0, 0, viewport.width, viewport.height);
if (i < pdf.numPages) {
doc.addPage();
}
});
});
});
// Save the PDF
doc.save(‘word-to-pdf.pdf’);

Понравилась статья? Поделить с друзьями:
  • Javascript word to array
  • Java write excel file
  • Javascript search word in text
  • Java word файлы в один
  • Javascript regex not a word