Get this book -> Problems on Array: For Interviews and Competitive Programming
Reading time: 10 minutes | Coding time: 15 minutes
In this article you will learn how how to create PDF files out of your word document.Well up until recently, creating complex or elegant PDFs in Javascript has been challenging.Here I’m going to show you step-by-step the path of to create beautiful PDFs.
Before we dive further into the process why don’t we learn what is word document or what is PDFs or what is the need to convert word document to PDFs.
Word Document is a popular word-processing program used primarily for creating documents such as letters, brochures, learning activities, tests, quizzes and students’ homework assignments.DOC stands for DOCument file. A DOC file can contain formatted text, images, tables, graphs, charts, page formatting, and print settings.
The Portable Document Format (PDF)is a file format developed by Adobe in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems
Yes, PDFs were created to avoid any changes that might occur due change in hardware or software which might have occurred to you if you had ever used Word document.Let’s take a example,Whenever a make a word document and to a printing shop for the print the shop owner may have different OS,hardware or even software and when you get your printed document it is not what you saved at your computer maybe there is a large gap between line format is distorted or something else.So,that’s where PDF come’s to rescue.
Word to PDF approaches
There are three methods that I am going to discuss today which are very easy to use and produce excellent results. These are using:
- awesome-unoconv
- libreoffice-convert
- docx-pdf
1. awesome-unoconv
awesome-unoconv is nodeJS wrapper for converting Office files to PDF or HTML
REQUIREMENT
Unoconv is required, which requires LibreOffice (or OpenOffice)
You can install unoconv in linux operating system by using following command
sudo apt-get install unoconv
INSTALLATION
npm install awesome-unoconv
CODE
const path = require('path');
const unoconv = require('awesome-unoconv');
//Place your word file in source
const sourceFilePath = path.resolve('./word_file.docx');
const outputFilePath = path.resolve('./myDoc.pdf');
unoconv
.convert(sourceFilePath, outputFilePath)
.then(result => {
console.log(result); // return outputFilePath
})
.catch(err => {
console.log(err);
});
Now go to the terminal and run your command pdf will be created in current working directory with name «myDoc.pdf» (you choose any name you like).
This is one of the method to convert word document to pdf let’s keep going.
2. libreoffice-convert
A simple and fast node.js module for converting office documents to different formats.
DEPENDENCY
Since,I am using linux please Install libreoffice in /Applications (Mac), with your favorite package manager (Linux), or with the msi (Windows).
INSTALLATION
npm install libreoffice-convert
CODE
const libre = require('libreoffice-convert');
const path = require('path');
const fs = require('fs');
const extend = '.pdf'
const FilePath = path.join(__dirname, './word_file.docx');
const outputPath = path.join(__dirname, `./example${extend}`);
// Read file
const enterPath = fs.readFileSync(FilePath);
// Convert it to pdf format with undefined filter (see Libreoffice doc about filter)
libre.convert(enterPath, extend, undefined, (err, done) => {
if (err) {
console.log(`Error converting file: ${err}`);
}
// Here in done you have pdf file which you can save or transfer in another stream
fs.writeFileSync(outputPath, done);
});
Since,This is only for libre-office you might not find it very usefull if you are using windows but for for linux/Mac operating system it is very popular.
Let’s look at the last and third method to convert word document to PDFs
3. docx-pdf
It is a library that converts docx file to pdf and it is one of most optimal and quality wise best among the three and most easiest one also.
INSTALLATION
npm install docx-pdf
CODE
var docxConverter = require('docx-pdf');
docxConverter('./word_file.docx','./output.pdf',function(err,result){
if(err){
console.log(err);
}
console.log('result'+result);
});
Output should be output.pdf which will be produced on the output path your provided.
Question
Which one of the following is the correct command to install any module/package in your project?
npm install package-name
node install package-name
install package-name
nodejs install package-name
npm install
is the correct syntax,Example:
If you want to install express module you can do it by writing the command
> npm install express
If you want to see the project you can use this link to my github page.
In this step-by-step tutorial, we’ll walk you through how to generate an Office Word or DOCX document programmatically and then either download it or display it in Apryse WebViewer using the JavaScript PDF library.
Everything to generate DOCX and save as a PDF is done in client-side JavaScript without any MS Office or server-side dependencies. You can do this in a vanilla JS app, React, or any other framework of your choice. The same functionality is also available if you are building a Node.js app.
For this walkthrough, we’ll use a React app. If you want to skip the steps and just look at the code, you can check out the repo on our GitHub.
Now let’s get started. First, generate a React app by running:
npx create-react-app
Generate DOCX Document Programmatically
We’ll use the popular DOCX library by Dolan. This library allows us to generate DOCX documents programmatically, directly in the browser or in a Node.js environment.
npm i docx
Inside of App.js
, let’s create a new function to generate our document.
import './App.css';
import { Document, Packer, Paragraph, TextRun } from 'docx';
function App() {
// generate DOCX document
const generateDocx = async () => {
const doc = new Document({
sections: [
{
properties: {},
children: [
new Paragraph({
children: [
new TextRun(`DOCX lib and PDFTron's WebViewer is cutting-edge`),
],
}),
],
},
],
});
const blob = await Packer.toBlob(doc);
return blob;
};
return <div className='App'></div>;
}
export default App;
So far, we have a Word document with a text paragraph that says `DOCX lib and PDFTron’s WebViewer is cutting-edge’. At this stage, we can go ahead and download the DOCX document; however, let’s first add some tables, populate those with data, and then display the document back to the user.
Display Generated DOCX to the User
First, we’ll need to add a viewing component to display our DOCX documents. Apryse WebViewer component can also be used to convert from DOCX to PDF, client-side, without any Office dependencies.
Create a new component in src/component/Viewer.js
:
import React, { useRef, useEffect, useContext } from 'react';
import WebViewer from '@pdftron/webviewer';
import WebViewerContext from '../context/webviewer.js';
const Viewer = () => {
const viewer = useRef(null);
const { setInstance } = useContext(WebViewerContext);
// if using a class, equivalent of componentDidMount
useEffect(() => {
WebViewer(
{
path: '/webviewer/lib',
},
viewer.current
).then((instance) => {
setInstance(instance);
});
}, []);
return <div className='webviewer' ref={viewer} style={{height: "100vh"}}></div>;
};
export default Viewer;
We’ll use context to recycle the WebViewer instance. Create a new file in src/context/webviewer.js
:
import React from 'react';
const WebViewerContext = React.createContext({});
export default WebViewerContext;
Then, update our App.js:
import './App.css';
import { useEffect, useState } from 'react';
import { Document, Packer, Paragraph, TextRun } from 'docx';
import Viewer from './components/Viewer';
import WebViewerContext from './context/webviewer.js';
function App() {
const [instance, setInstance] = useState();
// generate DOCX document
const generateDocx = async () => {
const doc = new Document({
sections: [
{
properties: {},
children: [
new Paragraph({
children: [
new TextRun(`DOCX lib and PDFTron's WebViewer is awesome!`),
],
}),
],
},
],
});
const blob = await Packer.toBlob(doc);
return blob;
};
useEffect(() => {
const generateAndLoadDocument = async () => {
const docBlob = await generateDocx();
await instance.Core.documentViewer.loadDocument(docBlob, {
extension: 'docx',
});
};
if (instance) {
generateAndLoadDocument();
}
}, [instance]);
return (
<WebViewerContext.Provider value={{ instance, setInstance }}>
<div className='App'>
<Viewer />
</div>
</WebViewerContext.Provider>
);
}
export default App;
At this stage, we’ve generated and loaded our Word document in WebViewer. From here, users can annotate, sign, highlight, and comment on the document.
Convert DOCX to PDF in the Browser
If we wanted to simply convert and download a PDF instead, and the UI is not needed, here’s how WebViewer can be leveraged to create a blob:
// saves the document with annotations in it
const doc = documentViewer.getDocument();
const xfdfString = await annotationManager.exportAnnotations();
const data = await doc.getFileData({xfdfString});
const arr = new Uint8Array(data);
const blob = new Blob([arr], { type: 'application/pdf' });
You can also load the library without initializing WebViewer instance
. Here is a guide that demonstrates how to get Core
.
If you’d like to do it server-side, you can explore the Node.js guide.
Wrap Up
We hope you found this article helpful! If you have any questions or feedback, feel free to email me directly.
It appears that even after three years ncohen had not found an answer. It was also unclear if it had to be a free (as in dollars) solution.
The original requirements were:
using client side resources only and no plugins
Do you mean you don’t want server side conversion? Right, I would like my app to be totally autonomous.
Since all the other answers/comments only offered server side component solutions, which the author clearly stated was not what they wanted, here is a proposed answer.
The company I work for has had this solution for a few years now, that can convert DOCX (not odt yet) files to PDF completely in the browser, with no server side component required. This currently uses either asm.js/PNaCl/WASM depending on the exact browser being used.
https://www.pdftron.com/samples/web/samples/viewing/viewing/
Open an office file using the demo above, and you will see no server communication. Everything is done client side. This demo works on mobile browsers also.
In this post, You will learn how to convert Docx files to pdf documents in JavaScript and nodejs.
Docx/doc are document file formats from Microsoft, that contains images, text, tables, and styles
PDF files are from Adobe company, which is a separate format for representing the content of images, texts, and styles
There are a lot of online tools to do the conversion from doc to pdf. Sometimes, As a programmer, you need to have a conversion of different formats in the JavaScript/NodeJS applications.
JavaScript/NodeJS offers multiple ways to convert using npm packages
- docx-to-pdf
- libreoffice-convert
You can also check other posts on npm command deprecate option is deprecated
How to Convert word document to pdf in Nodejs application
First, Create a Nodejs application from scratch.
Let’s create a nodejs application from scratch using the npm init -y command
in a new folder
B:blogjsworknodeworkdoctopdf>npm init -y
Wrote to B:blogjsworknodeworkdoctopdfpackage.json:
{
"name": "doctopdf",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo "Error: no test specified" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC"
}
This creates a package.json as follows
{
"name": "doctopdf",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo "Error: no test specified" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC"
}
convert docx to pdf using docx-pdf library
docx-pdf
is a simple library to convert Docx
to a pdf
document.
First, Install Docx-pdf npm library
npm install Docx-pdf --save
This will add a dependency in package.json as follows
{
"dependencies": {
"docx-pdf": "0.0.1"
}
}
In javascript, import docx-pdf using the required for ES5 modules
var converter = require('docx-pdf');
convert objects accepts
- input file which is a word document
- output file is the name of a pdf document
- callback which has
err
for error messages for conversion failed
and result in successful conversion - a result is an object containing a filename attribute and the value is the pdf document name and path
var converter = require('docx-pdf');
converter('test.docx', 'output.pdf', function(err, result) {
if (err) {
console.log("Converting Doc to PDF failed", err);
}
console.log("Converting Doc to PDF succesfull", result);
});
And same code was written with async and await keywords for the asynchronous process.
convert docx to pdf with async/await
This will be useful for bigger files of sizes
The declared function which accepts input and output filename
It returns the promise object with reject
for failed conversions and resolve
for successful conversions.
And, docxConverter logic calls inside the async
keyword with an anonymous function for asynchronous processing.
async function ConvertDocToPdf(inputfile, outputfile) {
return new Promise((resolve, reject) =>{
const inputPath = path.join(__dirname, "test.docx");
const outputPath = path.join(__dirname, `/test.pdf`);
let docData = await fs.readFile(inputPath)
docxConverter(inputfile, outputfile, (err, result) => {
return err ?
reject(err) :
resolve(result)
})
})
}
You need to call the same function with the await
keyword
await ConvertDocToPdf("test.docx", "test.pdf")
It is a simple library, the only disadvantage is not able to convert formatting styles.
libreoffice-convert npm package
libreoffice is an open-source office package for managing office documents.
libreoffice-convert is an npm package in nodejs that provides manipulation of word documents.
First, install libreoffice-convert npm package using the npm install command
npm install libreoffice-convert --save
Example code to convert docx to pdf using the libreoffice-convert package:
const libre = require('libreoffice-convert');
const path = require('path');
const fs = require('fs');
async function ConvertDocToPdf() {
try {
const inputPath = path.join(__dirname, "test.docx");
const outputPath = path.join(__dirname, `/test.pdf`);
let docData = await fs.readFile(inputPath)
return new Promise((resolve, reject) => {
libre.convert(docData, '.pdf', undefined, (err, done) => {
if (err) {
reject('Conversion Failed')
}
fs.writeFileSync(outputPath, done);
resolve("Convertion successfull")
});
})
} catch (err) {
console.log("Error in input reading", err);
}
}
a sequence of steps for the above code
- Defined function with async keyword for asynchronous processing
- import
libreoffice-convert
,fs
, andpath
modules into code - read the input file using readFile method of fs module in NodeJS
- libre.convert the docx to pdf file
- conversion code is wrapped in a
promise
object - for conversion failed cases, the reject promise is returned
- the promise is resolved for a successful conversion,
- Finally written output pdf file using the
writeFileSync
method
Conclusion
To Sum up, Learned how to Convert word to pdf in nodejs in multiple ways.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
const doc = new jsPDF(); | |
// Load the Word file as a Blob | |
const file = await fetch(‘path/to/word.docx’).then(res => res.blob()); | |
// Convert the Word file to a base64-encoded data URI | |
const reader = new FileReader(); | |
reader.readAsDataURL(file); | |
reader.onload = () => { | |
const base64 = reader.result; | |
// Create a new Blob from the data URI | |
const dataURI = base64.split(‘,’)[1]; | |
const byteString = atob(dataURI); | |
const arrayBuffer = new ArrayBuffer(byteString.length); | |
const int8Array = new Uint8Array(arrayBuffer); | |
for (let i = 0; i < byteString.length; i++) { | |
int8Array[i] = byteString.charCodeAt(i); | |
} | |
const wordFile = new Blob([arrayBuffer], { type: ‘application/msword’ }); | |
// Use PDFJS to render the Word file as a PDF | |
PDFJS.getDocument(wordFile).then(pdf => { | |
// Add each page of the PDF to the jsPDF document | |
for (let i = 1; i <= pdf.numPages; i++) { | |
pdf.getPage(i).then(page => { | |
const viewport = page.getViewport({ scale: 1 }); | |
const canvas = document.createElement(‘canvas’); | |
const context = canvas.getContext(‘2d’); | |
canvas.height = viewport.height; | |
canvas.width = viewport.width; | |
page.render({ canvasContext: context, viewport: viewport }).then(() => { | |
doc.addImage(canvas.toDataURL(‘image/png’), ‘png’, 0, 0, viewport.width, viewport.height); | |
if (i < pdf.numPages) { | |
doc.addPage(); | |
} | |
}); | |
}); | |
}); | |
// Save the PDF | |
doc.save(‘word-to-pdf.pdf’); |