Word to latex python

Project description

word2Tex: Citation handling and fixing application

The modules included in this package are cite2Tex and fixBibTex which help with
citations when migrating a manuscript into LaTeX.

Installation

pip install word2Tex

cite2Tex

This module can be used as a command-line tool or in python. It goes through a
text file and converts citations into LaTeX format (e.g. Viena et al. 2018 —>
cite{Viena2018}). If a bibtex bibliography file is provided then citations
will be looked up by author and year and the correct citation key will be used.
Additionally, if a bib is provided, citations not found in the bib will be left
alone and you will be notified which citations are missing from the bib.

Usage

To use in the command line:

cite2Tex path_to_file.txt -b path_to_bib.bib -o path_to_output_file.txt

the -b and -o flags are optional.

To use in python:

from word2Tex import cite2Tex as c2t
fn = 'path_to_file.txt'
bib_fn = 'path_to_bib.bib' # optional
save_file = 'path_to_save_edited_text.txt' # optional, regardless will always write to a new file

# This will allow you to view all citations in the document and see what they will become
with open(fn) as f:
    matches = c2t.find_matches(f.read(), bib=bib_fn)
#This creates the dataframe matches which you cna view and check

# To convert a file
w2t.citations2Tex(fn, bib=bib, save_file=save_file)

fixBibTex

This module allows correction of citation ID in bibtex files when exported from applications such as EndNote. Citation IDs will be set to AuthorYear using the first authors last name. If there are duplicates with this method then the article’s journal initials will be tacked onto the end or an index number to ensure unique IDs.

Usage

To use from command-line:

fixBibTex path_to_bib_file.bib -o output_file.bib

The output file is optional. Regardless this will always save to a new file to avoid dataloss.

In python:

from word2Tex import fixBibTex as fbt
fn = 'path_to_bib_file.bib'
out_fn = 'out_file_path.bib' # optional
fbt.fix_bibtexDB(fn, save_file=out_fn)

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

I have a docx file containing a few equations in different pages. With Python and lxml, I was successful in extracting the content. I now need to convert the equations in Word to Latex. Some of the equations are shown as:

- eq \f (sinx,\r(1 - sin 2 x))

Is there any Python library of any tool that I can use to convert the equation to Latex format?

Here is a snippet of the XML file which I obtained from docxfile/word/document.xml:

<w:p w:rsidR="00677018" w:rsidRPr="007D05E5" w:rsidRDefault="00677018" w:rsidP="00677018">
            <w:pPr>
                <w:pStyle w:val="w" />
                <w:jc w:val="both" /></w:pPr>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/>
                <w:t>a.</w:t>
            </w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/></w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="begin" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:instrText xml:space="preserve">eq bbc[(aco2hs4(7,-3,-1,2))</w:instrText>
            </w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="end" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/>
                <w:t>b.</w:t>
            </w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:tab/></w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="begin" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:instrText xml:space="preserve">eq f(5,8)</w:instrText>
            </w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="end" /></w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="begin" /></w:r>
            <w:r w:rsidRPr="007D05E5">
                <w:instrText xml:space="preserve">eq bbc[(aco2hs4(7,-3,-1,2))</w:instrText>
            </w:r>
            <w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
                <w:fldChar w:fldCharType="end" /></w:r>
        </w:p>

paperaj-template — Write jounal papers or thesis in word and convert to LaTeX for submission! (All using GitHub actions!)

TL;DR: You can use any LaTeX template! Sections in the word document (see main.docx) with italicized headings are split into latex files that can be included into your template. See example in main.tex and inclusions.tex. The latex files in paperaj/ folder are autogenerated by the GitHub action from the word document!

This is a template that uses paperaj as a GitHub action.

Paperaj is a combination of bash and python scripts for converting MS word document to a latex document for academic journals. You can use any journal template for latex compilation. This can be used as a standalone script (needs pandoc and latex installed) or as a GitHub action. Just create a repo from this template that uses Paperaj GitHub action and the GitHub will latex-compile your manuscript!

paperaj

How it works

Paperaj creates a set of plain latex files from the word document in the paperaj folder. Images, tables and referencing are supported during the conversion. These plain latex files can be included in the journal’s latex template using: input{filename}. See main.docx in the template for word document format. See main.tex in the template to see how you can include paperaj generated latex files in the latex entry file. Just use this template that uses Paperaj GitHub action and the GitHub will latex-compile your manuscript!

Give us a star ⭐️

If you find this project useful, give us a star. It helps others discover the project.

Usage

As GitHub action (recommended)

  • Use this github template
  • Use the docx in the template
  • Add bib and tex files.
  • set the names of docx, bib and latex entry in paperaj.env file
  • This template generates LaTeX files on push to develop branch and compile to PDF on push to main branch!

If you want to run this locally in your computer (requires pandoc and latex installed), check out paperaj.

Arguments in .env file (Needed only if compiling locally)

  • BIBLIO=references.bib
  • DOCX=article.docx
  • PDF=article.pdf
  • LATEXFOLDER=./ # no trailing /
  • LATEXENTRY=main.tex
  • BIBCOMPILE=bibtex or biber
  • TEXCOMPILE=defer or yes
  • ACRONYMS=sample.csv
  • GLOSSARY=sample.csv
  • MINDMAP=create
  • CITETAG= cite or citep
  • PANDOCPATH=

Figures

  • Use TWO_COLUMN or LATEXROTATE in captions of figure
  • FIGURE_ or TABLE_ for inline ref

Referencing

cite{AuthorYEAR} inline

Using Zotero

  • Use this csl

Flatten into single latex file without inclusions

  • Just create a folder called flatten.

arXiv

  • Add required latex files to arxiv folder.

Clean version for submission

  • The clean latex files without latex comments for submission is in the clean folder.

Mindmapping

plant UML

  • ‘** first’

  • ‘*** second’

  • ‘**_’ adds title

  • Add the above to the Zotero notes for references

Notebook to pdf

  • jupyter-nbconvert —to pdf acnode.ipynb

Extract highlights from PDF

pdfannots

Other Instructions:

  • set repo permissions to read/write
  • set entry.tex as overleaf entry

Contributors

  • Bell Eapen | Twitter Follow

Elaborating my answer in a comment to the question, this is what I got so far.

You need to install Python (I installed python2.7), and lxml and PIL. The easiest way I’ve found to install the later in Windows is going to http://www.lfd.uci.edu/~gohlke/pythonlibs/, and download lxml-2.3.4.win32-py2.7.‌exe and PIL-1.1.7.win32-py2.7.‌exe (note that you have to choose the appropiate files for your python version). Running those exe, the appropiate libraries and bindings are installed.

Then you can download https://github.com/mikemaccana/python-docx. I didn’t try to properly install this one. I only uncompressed it in a folder, open a cmd shell, navigate to that folder and run the provided examples (example-extracttext.py and example-makedocument.py) which worked. My setup was fine.

Then I adapted the code of example-extracttext to our needs, and wrote the following script, which I named run.py:

#!/usr/bin/env python2.7
'''
This file opens a docx (Office 2007) file and dumps the text. Then it uses pdflatex to compile it.
'''
from docx import *
import os
import sys
if __name__ == '__main__':        
    try:
        wordfile = sys.argv[1]
        latexfile = sys.argv[1].replace('docx', 'tex')
        logfile = sys.argv[1].replace('docx', 'log')
        document = opendocx(wordfile)
        newfile = open(latexfile,'w')        
    except:
        print('Please supply an input file. For example:')    
        print('''  run.py 'MyDocument.docx' ''')    
        exit()
    # Fetch all the text out of the document we just created        
    paratextlist = getdocumenttext(document)    

    # Make explicit unicode version    
    newparatextlist = []
    for paratext in paratextlist:
        newparatextlist.append(paratext.encode("utf-8"))                  

    ## Print our documnts test with two newlines under each paragraph
    newfile.write('nn'.join(newparatextlist))
    newfile.close()

    ## Now use pdflatex to compile the result
    os.system("pdflatex %s" % latexfile)
    while "Rerun" in open(logfile).read():
        os.system("pdflatex %s" % latexfile)

To test it, I wrote the following Word document (note that I used Word styles to mark the section titles, and used a table to insert the code of a tikz picture, and even inserted an image showing the result for that figure, obviously not in the first pass, but later). Note also which I used a Word bulleted list to help marking the itemized list. All this Word styles will be dropped when converting to plain text, but allows us to make the display more clear.

enter image description here

I saved this document with the name Prueba.docx in the same folder than the script run.py, and ran the script on the word file:

C:UsersjldiazDownloadsmikemaccana-python-docx-647ee97>python run.py Prueba.docx

After two compilations (the script takes care of compiling again if references are not solved), the resulting pdf is the following:

enter image description here

(at this point I used IrfanView to screen-capture the tikz picture and paste it into the word document)

Note: If you use SumatraPDF as pdf reader, you don’t need to close the pdf document before compiling again. SumatraPDF updates the view when the pdf changes.

UPDATE:

Tested also with math, comments and revision marks. All works as expected (comments are ignored, revision marks are ignored, latest version of the text is what goes to the final .tex file).

However, caution about carriage returns in the Word file. «Enter» key in Word inserts a end-of-paragraph mark, which is translated by python into a blank line (which is a par to tex, so everything is fine). However in some environments, we don’t want those blank lines (for example, inside an equation environment, or other places where TeX doesn’t expect a par). We can avoid this by using Shift+Enter in Word, which inserts an end-of-line instead of an end-of-par. Those end-of-lines are translated by python to spaces.

My experiments with comments, revisions and math:

enter image description here

and the result after the script:

enter image description here

Improve Article

Save Article

Like Article

  • Read
  • Discuss
  • Improve Article

    Save Article

    Like Article

    Latex :

    Latex pronounced as “Lay-tech” is a document making system for high-quality documentation. It is mostly used for technical or scientific document preparation but it can be used for almost all forms of publishing. Latex is not a word processor like MS Word or LibreOffice Writer. Instead, Latex encourages authors not to worry about the look of their documents but to concentrate on getting the right content. For example, consider the below document:

    This article explains the use of pylatex module
    GeeksforGeeks
    October 2018
    

    To produce this in most word processors, the author would have to decide what layout to use, so would select (suppose) 18pt Helvetica for the title, 12pt Times Roman for the name, and so on. This results into author wasting their time designing the document. Latex is based on the idea that let authors get on with writing the document and leave the designing of the document to document designers. So, in Latex, you would input the above document as:

    documentclass{article}
    title{This article explains use of pylatex module}
    author{GeeksforGeeks}
    date{October 2018}
    begin{document}
       maketitle
       Continue reading
    end{document}
    

     
    Layout of a latex document :
    There are two main parts of a latex document:
    Preamble :

    • Preamble is the first part of a latex file.
    • It contains details about the document such as Document class, author name, title etc

    Body :

    • In the body part of a latex document, sections, tables, mathematical equations, graphs etc can be included
    • All the contents of the document are within a ‘begin{document}’ and a ‘end{document}’

     
    Some features of Latex are:

    1. Preparing journal articles, technical reports, technical or non-technical books, and also slide presentations.
    2. It provides better control over large documents containing sectioning, references, tables and figures.
    3. It can also be useful for preparing documents containing complex mathematical formulas.
    4. Generation of bibliographies and indexes is automatic in LaTeX.
    5. It also provides multi-lingual typesetting support.
    6. In a latex document we can also add graphics, artwork, and process or spot colour.
    7. Usage of PostScript or metafont fonts is also possible in LaTeX.

     
    Example of a LaTeX document:
    Example 1: In this example we form a simple latex in order to from latex, we used simple input format as we used in latex.

    documentclass{article}

    usepackage[T1]{fontenc}

    usepackage[utf8]{inputenc}

    usepackage{lmodern}

    usepackage{textcomp}

    usepackage{lastpage}

    usepackage[tmargin=1cm, lmargin=10cm]{geometry}

    usepackage{amsmath}

    usepackage{tikz}

    usepackage{pgfplots}

    pgfplotsset{compat=newest}

    usepackage{graphicx}

    begin{document}

    normalsize

    section{The regular stuff}

    label{sec:The regular stuff}

    Some text and some

    textit{italic text. }

    newline

    Also some crazy symbols: $&#{}

    subsection{Incorrect math}

    label{subsec:Incorrect math}

    [

    2*3 = 22

    ]

    end{document}

    Output:

     Example 2: In this example we used, label, subsection in order to form a latex.

    documentclass{article}

    usepackage[T1]{fontenc}

    usepackage[utf8]{inputenc}

    usepackage{lmodern}

    usepackage{textcomp}

    usepackage{lastpage}

    usepackage[tmargin=1cm, lmargin=10cm]{geometry}

    usepackage{amsmath}

    usepackage{tikz}

    usepackage{pgfplots}

    pgfplotsset{compat=newest}

    usepackage{graphicx}

    subsection{Table}

    label{subsec:Table}

    begin{tabular}{rc|cl}

    hline

    a&b&c&d

    cline{1

    -

    2}

    &&&

    e&f&g&7h

    end{tabular}

    section{Special features}

    label{sec:Special features}

    subsection{Correct matrix equations}

    label{subsec:Correct matrix equations}

    [

    begin{pmatrix}

    1&4&4

    2&3&4

    2&2&5

    end{pmatrix} begin{pmatrix}

    800

    30

    30

    end{pmatrix} = begin{pmatrix}

    810

    60

    50

    end{pmatrix}

    ]

    end{document}

    Output :

     What is Pylatex :
    PyLaTeX is a Python library for creating and compiling latex documents. The goal of this library is to be easy but is also to provide an extensible interface between Python and latex.

    Some features of pylatex are:

    • We can access all the features of LaTeX in python using this module
    • We can make documents with fewer lines of code
    • Since python is a high-level language it is easier to write code for pylatex in python as compared to LaTeX
    • In the above LaTeX code you must have seen that to give equations we have to calculate values and then input in LaTeX document but with python’s added functionality of performing arithmetic operations it is much easier to prepare documents

     
    Create a Pylatex document :

    • Install MikTeX and pylatex module in your system and import it into python code.
      For installing MikTeX on your system, go to :
      https://miktex.org/download
      

      For installing pylatex on windows based operating system, enter the following command in command prompt:

      python -m pip install pylatex
      
    • To create a document import document class from pylatex module. In latex there are different document types : article, report, letter etc. To create a document of the type article, create an object of the Document class of latex and as an argument pass ‘article’
      doc=Document(documentclass='article')
      
    • To add the necessary changes in the document such as styling or formatting, import the classes required in the python code from pylatex. To add different utilities in a latex document using pylatex the following way is feasible
      from pylatex import Document, Section, Subsection
      from pylatex.utils import italic, bold
      
    • To generate PDF file of the document, call the generate_pdf method of the Document class using the object of Document class and make sure to pass the name of the pdf document in its argument in this way
      doc.generate_pdf("Demo_article")
      

     
    Pylatex Example :
    Code 1:

    import numpy as np

    from pylatex import Document, Section, Subsection, Tabular

    from pylatex import Math, TikZ, Axis, Plot, Figure, Matrix, Alignat

    from pylatex.utils import italic

    import os

    if __name__ == '__main__':

        image_filename = os.path.join(os.path.dirname(__file__), 'kitten.jpg')

        geometry_options = {"tmargin": "1cm", "lmargin": "10cm"}

        doc = Document(geometry_options=geometry_options)

        with doc.create(Section('The simple stuff')):

            doc.append('Some regular text and some')

            doc.append(italic('italic text. '))

            doc.append('nAlso some crazy characters: $&#{}')

            with doc.create(Subsection('Math that is incorrect')):

                doc.append(Math(data=['2*3', '=', 9]))

            with doc.create(Subsection('Table of something')):

                with doc.create(Tabular('rc|cl')) as table:

                    table.add_hline()

                    table.add_row((1, 2, 3, 4))

                    table.add_hline(1, 2)

                    table.add_empty_row()

                    table.add_row((4, 5, 6, 7))

        doc.generate_pdf('full', clean_tex=False)

    Output:

     Code 2:

    import numpy as np

    from pylatex import Document, Section, Subsection, Tabular

    from pylatex import Math, TikZ, Axis, Plot, Figure, Matrix, Alignat

    from pylatex.utils import italic

    import os

    if __name__ == '__main__':

        image_filename = os.path.join(os.path.dirname(__file__), 'kitten.jpg')

        geometry_options = {"tmargin": "1cm", "lmargin": "10cm"}

        doc = Document(geometry_options=geometry_options)

        a = np.array([[100, 10, 20]]).T

        M = np.matrix([[2, 3, 4],

                       [0, 0, 1],

                       [0, 0, 2]])

        with doc.create(Section('The fancy stuff')):

            with doc.create(Subsection('Correct matrix equations')):

                doc.append(Math(data=[Matrix(M), Matrix(a), '=', Matrix(M * a)]))

            with doc.create(Subsection('Alignat math environment')):

                with doc.create(Alignat(numbering=False, escape=False)) as agn:

                    agn.append(r'frac{a}{b} &= 0 \')

                    agn.extend([Matrix(M), Matrix(a), '&=', Matrix(M * a)])

            with doc.create(Subsection('Beautiful graphs')):

                with doc.create(TikZ()):

                    plot_options = 'height=4cm, width=6cm, grid=major'

                    with doc.create(Axis(options=plot_options)) as plot:

                        plot.append(Plot(name='model', func='-x^5 - 242'))

                        coordinates = [

                            (-4.77778, 2027.60977),

                            (-3.55556, 347.84069),

                            (-2.33333, 22.58953),

                            (-1.11111, -493.50066),

                            (0.11111, 46.66082),

                            (1.33333, -205.56286),

                            (2.55556, -341.40638),

                            (3.77778, -1169.24780),

                            (5.00000, -3269.56775),

                        ]

                        plot.append(Plot(name='estimate', coordinates=coordinates))

            with doc.create(Subsection('Cute kitten pictures')):

                with doc.create(Figure(position='h!')) as kitten_pic:

                    kitten_pic.add_image(image_filename, width='120px')

                    kitten_pic.add_caption('Look it's on its back')

        doc.generate_pdf('full', clean_tex=False)

    Output :

    Like Article

    Save Article

    Vote for difficulty

    Current difficulty :
    Basic

    Понравилась статья? Поделить с друзьями:
  • Word to know right from wrong
  • Word to king of the road
  • Word to jpg расширение
  • Word to jpg zamzar
  • Word to jpg program