Word to pdf python linux

I’m dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx and other methods, I do not require a windows machine with word installed, or even libreoffice on linux, for most of the processing (my web server is pythonanywhere — linux but without libreoffice and without sudo or apt install permissions). But converting to pdf seems to require one of those. From exploring questions here and elsewhere, this is what I have so far:

import subprocess

try:
    from comtypes import client
except ImportError:
    client = None

def doc2pdf(doc):
    """
    convert a doc/docx document to pdf format
    :param doc: path to document
    """
    doc = os.path.abspath(doc) # bugfix - searching files in windows/system32
    if client is None:
        return doc2pdf_linux(doc)
    name, ext = os.path.splitext(doc)
    try:
        word = client.CreateObject('Word.Application')
        worddoc = word.Documents.Open(doc)
        worddoc.SaveAs(name + '.pdf', FileFormat=17)
    except Exception:
        raise
    finally:
        worddoc.Close()
        word.Quit()


def doc2pdf_linux(doc):
    """
    convert a doc/docx document to pdf format (linux only, requires libreoffice)
    :param doc: path to document
    """
    cmd = 'libreoffice --convert-to pdf'.split() + [doc]
    p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    p.wait(timeout=10)
    stdout, stderr = p.communicate()
    if stderr:
        raise subprocess.SubprocessError(stderr)

As you can see, one method requires comtypes, another requires libreoffice as a subprocess. Other than switching to a more sophisticated hosting server, is there any solution?

docx2pdf

PyPI

Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).

On Windows, this is implemented via win32com while on macOS this is implemented via JXA (Javascript for Automation, aka AppleScript in JS).

Install

On macOS:

brew install aljohri/-/docx2pdf

Via pipx:

Via pip:

CLI

usage: docx2pdf [-h] [--keep-active] [--version] input [output]

Example Usage:

Convert single docx file in-place from myfile.docx to myfile.pdf:
    docx2pdf myfile.docx

Batch convert docx folder in-place. Output PDFs will go in the same folder:
    docx2pdf myfolder/

Convert single docx file with explicit output filepath:
    docx2pdf input.docx output.pdf

Convert single docx file and output to a different explicit folder:
    docx2pdf input.docx output_dir/

Batch convert docx folder. Output PDFs will go to a different explicit folder:
    docx2pdf input_dir/ output_dir/

positional arguments:
  input          input file or folder. batch converts entire folder or convert
                 single file
  output         output file or folder

optional arguments:
  -h, --help     show this help message and exit
  --keep-active  prevent closing word after conversion
  --version      display version and exit

Library

from docx2pdf import convert

convert("input.docx")
convert("input.docx", "output.pdf")
convert("my_docx_folder/")

See CLI docs above (or in docx2pdf --help) for all the different invocations. It is the same for the CLI and python library.

Jupyter Notebook

If you are using this in the context of jupyter notebook, you will need ipywidgets for the tqdm progress bar to render properly.

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
``

Improve Article

Save Article

Like Article

  • Read
  • Discuss
  • Improve Article

    Save Article

    Like Article

    Tired of having to use online docx to PDF converters with crappy interfaces and conversion limits? Then, look no further than your friendly neighborhood language python’s docx2pdf module. This module is a hidden gem among the many modules for the python language.

    This module can be used to convert files singly or in bulk using the command line or a python program.

    Installation

    This module does not come built-in with Python. To install this module type the below command in the terminal.

    pip install docx2pdf

    Conversion using the command line

    The basic structure of the docx2pdf command line usage is:

    docx2pdf [input] [output]

    If only the input file is specified, it generates a pdf from the docx and stores it in the same folder.

    Example:

    docx2pdf usage using the command line

    GeeksforGeeks folder containing both the original GFG.docx and the converted GFG.pdf

    Original GFG.docx on the left and GFG.pdf on the right

    For the bulk conversion, you can specify the folder containing all the Docx files. The converted pdfs will get stored in the same folder.

    docx2pdf GeeksForGeeks_Folder/

    You can also explicitly specify the input and output file or folder by specifying the path.

    Conversion by importing the module and using it in the program

    An endless number of useful applications can be made using this module.

    Python3

    from docx2pdf import convert

    convert("GFG.docx")

    convert("GeeksForGeeksGFG_1.docx", "Other_FolderMine.pdf")

    convert("GeeksForGeeks")

    Output:

    Like Article

    Save Article

    1. Convert Docx to PDF With the pywin32 Package in Python
    2. Convert Docx to PDF With the docx2pdf Package in Python

    Convert Docx to PDF in Python

    This tutorial will discuss the methods to convert a docx file to a pdf file in Python.

    Convert Docx to PDF With the pywin32 Package in Python

    The pywin32 package is generally used for creating and initializing COM objects and using windows services in Python. As it is an external package, we have to install pywin32 before using it. The command to install pywin32 is given below.

    We can use the Microsoft Word application with this package to open the docx file and save it as a pdf file. The following code example shows us how to convert a docx file to a pdf file with the pywin32 package.

    import os
    import win32com.client
    
    wdFormatPDF = 17
    
    inputFile = os.path.abspath("document.docx")
    outputFile = os.path.abspath("document.pdf")
    word = win32com.client.Dispatch('Word.Application')
    doc = word.Documents.Open(inputFile)
    doc.SaveAs(outputFile, FileFormat=wdFormatPDF)
    doc.Close()
    word.Quit()
    

    We converted the document.docx to document.pdf with the win32com.client library in the above code. We opened the docx file with doc = word.Documents.Open(inputFile) and saved it as a pdf file with doc.SaveAs(outputFile, FileFormat=wdFormatPDF). In the end, we closed the opened document with doc.Close() function and exited Microsoft Word with word.Quit() function. Notice that the output file must already be created for this code to work properly. This means that we have to manually create a file named document.pdf before executing the above code. This process can also be automated with the help of file handling in Python. The following code snippet shows how we can further automate this whole process.

    import os
    import win32com.client
    
    wdFormatPDF = 17
    
    inputFile = os.path.abspath("document.docx")
    outputFile = os.path.abspath("document.pdf")
    file = open(outputFile, "w")
    file.close()
    word = win32com.client.Dispatch('Word.Application')
    doc = word.Documents.Open(inputFile)
    doc.SaveAs(outputFile, FileFormat=wdFormatPDF)
    doc.Close()
    word.Quit()
    

    In the above code, we create the output file with file = open(outputFile, "w") before opening Microsoft Word with the win32com.client library.

    Convert Docx to PDF With the docx2pdf Package in Python

    The pywin32 method works just fine and gives us a lot of control over the nitty-gritty details. The only drawback is that we have to write a lot of code for it. If we need to quickly convert a docx file to a pdf file without worrying too much about any low-level details, we can use the docx2pdf package in Python. The docx2pdf package provides us simple functions that take the file names and take care of all the low-level conversion stuff discussed in the previous section. The docx2pdf is also an external package. The command to install docx2pdf package is given below.

    The following code example shows us how to convert a docx file to a pdf file with the docx2pdf package.

    from docx2pdf import convert
    
    inputFile = "document.docx"
    outputFile = "document2.pdf"
    
    convert(inputFile, outputFile)
    

    We converted document.docx to document.pdf with the convert() function of docx2pdf package in the above code. The only drawback of this code is that we still need to create the output file before executing this code. We can automate this process as we did in the previous section using file handling.

    from docx2pdf import convert
    
    inputFile = "document.docx"
    outputFile = "document2.pdf"
    file = open(outputFile, "w")
    file.close()
    
    convert(inputFile, outputFile)
    

    In the above code, we create the output file with file = open(outputFile, "w") before calling the convert() function.

    Another one you could use is libreoffice, however as the first responder said the quality will never be as good as using the actual comtypes.

    anyways, after you have installed libreoffice, here is the code to do it.

    from subprocess import  Popen
    LIBRE_OFFICE = r"C:Program FilesLibreOfficeprogramsoffice.exe"
    
    def convert_to_pdf(input_docx, out_folder):
        p = Popen([LIBRE_OFFICE, '--headless', '--convert-to', 'pdf', '--outdir',
                   out_folder, input_docx])
        print([LIBRE_OFFICE, '--convert-to', 'pdf', input_docx])
        p.communicate()
    
    
    sample_doc = 'file.docx'
    out_folder = 'some_folder'
    convert_to_pdf(sample_doc, out_folder)
    

    The PythonAnywhere help pages offer information on working with PDF files here: https://help.pythonanywhere.com/pages/PDF

    Summary: PythonAnywhere has a number of Python packages for PDF manipulation installed, and one of them may do what you want. However, shelling out to abiword seems easiest to me. The shell command abiword --to=pdf filetoconvert.docx will convert the docx file to a PDF and produce a file named filetoconvert.pdf in the same directory as the docx. Note that this command will output an error message to the standard error stream complaining about XDG_RUNTIME_DIR (or at least it did for me), but it still works, and the error message can be ignored.

    Here is docx to pdf code for linux (for windows just download libreoffice and put soffice path instead of soffice)

    import subprocess
    
    def generate_pdf(doc_path, path):
    
        subprocess.call(['soffice',
                     # '--headless',
                     '--convert-to',
                     'pdf',
                     '--outdir',
                     path,
                     doc_path])
        return doc_path
    generate_pdf("docx_path.docx", "output_path")
    

    Понравилась статья? Поделить с друзьями:
  • Word to pdf protected
  • Word to pdf problem
  • Word to pdf portable скачать бесплатно
  • Word to pdf on line
  • Word to pdf not searchable