Project description
word2Tex: Citation handling and fixing application
The modules included in this package are cite2Tex and fixBibTex which help with
citations when migrating a manuscript into LaTeX.
Installation
pip install word2Tex
cite2Tex
This module can be used as a command-line tool or in python. It goes through a
text file and converts citations into LaTeX format (e.g. Viena et al. 2018 —>
cite{Viena2018}). If a bibtex bibliography file is provided then citations
will be looked up by author and year and the correct citation key will be used.
Additionally, if a bib is provided, citations not found in the bib will be left
alone and you will be notified which citations are missing from the bib.
Usage
To use in the command line:
cite2Tex path_to_file.txt -b path_to_bib.bib -o path_to_output_file.txt
the -b
and -o
flags are optional.
To use in python:
from word2Tex import cite2Tex as c2t fn = 'path_to_file.txt' bib_fn = 'path_to_bib.bib' # optional save_file = 'path_to_save_edited_text.txt' # optional, regardless will always write to a new file # This will allow you to view all citations in the document and see what they will become with open(fn) as f: matches = c2t.find_matches(f.read(), bib=bib_fn) #This creates the dataframe matches which you cna view and check # To convert a file w2t.citations2Tex(fn, bib=bib, save_file=save_file)
fixBibTex
This module allows correction of citation ID in bibtex files when exported from applications such as EndNote. Citation IDs will be set to AuthorYear using the first authors last name. If there are duplicates with this method then the article’s journal initials will be tacked onto the end or an index number to ensure unique IDs.
Usage
To use from command-line:
fixBibTex path_to_bib_file.bib -o output_file.bib
The output file is optional. Regardless this will always save to a new file to avoid dataloss.
In python:
from word2Tex import fixBibTex as fbt fn = 'path_to_bib_file.bib' out_fn = 'out_file_path.bib' # optional fbt.fix_bibtexDB(fn, save_file=out_fn)
Download files
Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
I have a docx file containing a few equations in different pages. With Python and lxml, I was successful in extracting the content. I now need to convert the equations in Word to Latex. Some of the equations are shown as:
- eq \f (sinx,\r(1 - sin 2 x))
Is there any Python library of any tool that I can use to convert the equation to Latex format?
Here is a snippet of the XML file which I obtained from docxfile/word/document.xml:
<w:p w:rsidR="00677018" w:rsidRPr="007D05E5" w:rsidRDefault="00677018" w:rsidP="00677018">
<w:pPr>
<w:pStyle w:val="w" />
<w:jc w:val="both" /></w:pPr>
<w:r w:rsidRPr="007D05E5">
<w:tab/>
<w:t>a.</w:t>
</w:r>
<w:r w:rsidRPr="007D05E5">
<w:tab/></w:r>
<w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
<w:fldChar w:fldCharType="begin" /></w:r>
<w:r w:rsidRPr="007D05E5">
<w:instrText xml:space="preserve">eq bbc[(aco2hs4(7,-3,-1,2))</w:instrText>
</w:r>
<w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
<w:fldChar w:fldCharType="end" /></w:r>
<w:r w:rsidRPr="007D05E5">
<w:tab/>
<w:t>b.</w:t>
</w:r>
<w:r w:rsidRPr="007D05E5">
<w:tab/></w:r>
<w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
<w:fldChar w:fldCharType="begin" /></w:r>
<w:r w:rsidRPr="007D05E5">
<w:instrText xml:space="preserve">eq f(5,8)</w:instrText>
</w:r>
<w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
<w:fldChar w:fldCharType="end" /></w:r>
<w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
<w:fldChar w:fldCharType="begin" /></w:r>
<w:r w:rsidRPr="007D05E5">
<w:instrText xml:space="preserve">eq bbc[(aco2hs4(7,-3,-1,2))</w:instrText>
</w:r>
<w:r w:rsidR="00453EF1" w:rsidRPr="007D05E5">
<w:fldChar w:fldCharType="end" /></w:r>
</w:p>
paperaj-template — Write jounal papers or thesis in word and convert to LaTeX for submission! (All using GitHub actions!)
TL;DR: You can use any LaTeX template! Sections in the word document (see main.docx) with italicized headings are split into latex files that can be included into your template. See example in main.tex and inclusions.tex. The latex files in paperaj/ folder are autogenerated by the GitHub action from the word document!
This is a template that uses paperaj as a GitHub action.
Paperaj is a combination of bash and python scripts for converting MS word document to a latex document for academic journals. You can use any journal template for latex compilation. This can be used as a standalone script (needs pandoc and latex installed) or as a GitHub action. Just create a repo from this template that uses Paperaj GitHub action and the GitHub will latex-compile your manuscript!
How it works
Paperaj creates a set of plain latex files from the word document in the paperaj folder. Images, tables and referencing are supported during the conversion. These plain latex files can be included in the journal’s latex template using: input{filename}
. See main.docx in the template for word document format. See main.tex in the template to see how you can include paperaj generated latex files in the latex entry file. Just use this template that uses Paperaj GitHub action and the GitHub will latex-compile your manuscript!
Give us a star ⭐️
If you find this project useful, give us a star. It helps others discover the project.
Usage
As GitHub action (recommended)
- Use this github template
- Use the docx in the template
- Add bib and tex files.
- set the names of docx, bib and latex entry in paperaj.env file
- This template generates LaTeX files on push to develop branch and compile to PDF on push to main branch!
If you want to run this locally in your computer (requires pandoc and latex installed), check out paperaj.
Arguments in .env file (Needed only if compiling locally)
- BIBLIO=references.bib
- DOCX=article.docx
- PDF=article.pdf
- LATEXFOLDER=./ # no trailing /
- LATEXENTRY=main.tex
- BIBCOMPILE=bibtex or biber
- TEXCOMPILE=defer or yes
- ACRONYMS=sample.csv
- GLOSSARY=sample.csv
- MINDMAP=create
- CITETAG= cite or citep
- PANDOCPATH=
Figures
- Use TWO_COLUMN or LATEXROTATE in captions of figure
- FIGURE_ or TABLE_ for inline ref
Referencing
cite{AuthorYEAR} inline
Using Zotero
- Use this csl
Flatten into single latex file without inclusions
- Just create a folder called flatten.
arXiv
- Add required latex files to arxiv folder.
Clean version for submission
- The clean latex files without latex comments for submission is in the clean folder.
Mindmapping
plant UML
-
‘** first’
-
‘*** second’
-
‘**_’ adds title
-
Add the above to the Zotero notes for references
Notebook to pdf
- jupyter-nbconvert —to pdf acnode.ipynb
Extract highlights from PDF
pdfannots
Other Instructions:
- set repo permissions to read/write
- set entry.tex as overleaf entry
Contributors
- Bell Eapen |
Elaborating my answer in a comment to the question, this is what I got so far.
You need to install Python (I installed python2.7), and lxml and PIL. The easiest way I’ve found to install the later in Windows is going to http://www.lfd.uci.edu/~gohlke/pythonlibs/, and download lxml-2.3.4.win32-py2.7.exe
and PIL-1.1.7.win32-py2.7.exe
(note that you have to choose the appropiate files for your python version). Running those exe
, the appropiate libraries and bindings are installed.
Then you can download https://github.com/mikemaccana/python-docx. I didn’t try to properly install this one. I only uncompressed it in a folder, open a cmd
shell, navigate to that folder and run the provided examples (example-extracttext.py
and example-makedocument.py
) which worked. My setup was fine.
Then I adapted the code of example-extracttext
to our needs, and wrote the following script, which I named run.py
:
#!/usr/bin/env python2.7
'''
This file opens a docx (Office 2007) file and dumps the text. Then it uses pdflatex to compile it.
'''
from docx import *
import os
import sys
if __name__ == '__main__':
try:
wordfile = sys.argv[1]
latexfile = sys.argv[1].replace('docx', 'tex')
logfile = sys.argv[1].replace('docx', 'log')
document = opendocx(wordfile)
newfile = open(latexfile,'w')
except:
print('Please supply an input file. For example:')
print(''' run.py 'MyDocument.docx' ''')
exit()
# Fetch all the text out of the document we just created
paratextlist = getdocumenttext(document)
# Make explicit unicode version
newparatextlist = []
for paratext in paratextlist:
newparatextlist.append(paratext.encode("utf-8"))
## Print our documnts test with two newlines under each paragraph
newfile.write('nn'.join(newparatextlist))
newfile.close()
## Now use pdflatex to compile the result
os.system("pdflatex %s" % latexfile)
while "Rerun" in open(logfile).read():
os.system("pdflatex %s" % latexfile)
To test it, I wrote the following Word document (note that I used Word styles to mark the section titles, and used a table to insert the code of a tikz picture, and even inserted an image showing the result for that figure, obviously not in the first pass, but later). Note also which I used a Word bulleted list to help marking the itemized list. All this Word styles will be dropped when converting to plain text, but allows us to make the display more clear.
I saved this document with the name Prueba.docx
in the same folder than the script run.py
, and ran the script on the word file:
C:UsersjldiazDownloadsmikemaccana-python-docx-647ee97>python run.py Prueba.docx
After two compilations (the script takes care of compiling again if references are not solved), the resulting pdf
is the following:
(at this point I used IrfanView to screen-capture the tikz picture and paste it into the word document)
Note: If you use SumatraPDF as pdf reader, you don’t need to close the pdf document before compiling again. SumatraPDF updates the view when the pdf changes.
UPDATE:
Tested also with math, comments and revision marks. All works as expected (comments are ignored, revision marks are ignored, latest version of the text is what goes to the final .tex file).
However, caution about carriage returns in the Word file. «Enter» key in Word inserts a end-of-paragraph mark, which is translated by python into a blank line (which is a par
to tex, so everything is fine). However in some environments, we don’t want those blank lines (for example, inside an equation environment, or other places where TeX doesn’t expect a par
). We can avoid this by using Shift+Enter in Word, which inserts an end-of-line instead of an end-of-par. Those end-of-lines are translated by python to spaces.
My experiments with comments, revisions and math:
and the result after the script:
Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article
Latex :
Latex pronounced as “Lay-tech” is a document making system for high-quality documentation. It is mostly used for technical or scientific document preparation but it can be used for almost all forms of publishing. Latex is not a word processor like MS Word or LibreOffice Writer. Instead, Latex encourages authors not to worry about the look of their documents but to concentrate on getting the right content. For example, consider the below document:
This article explains the use of pylatex module GeeksforGeeks October 2018
To produce this in most word processors, the author would have to decide what layout to use, so would select (suppose) 18pt Helvetica
for the title, 12pt Times Roman
for the name, and so on. This results into author wasting their time designing the document. Latex is based on the idea that let authors get on with writing the document and leave the designing of the document to document designers. So, in Latex, you would input the above document as:
documentclass{article} title{This article explains use of pylatex module} author{GeeksforGeeks} date{October 2018} begin{document} maketitle Continue reading end{document}
Layout of a latex document :
There are two main parts of a latex document:
Preamble :
- Preamble is the first part of a latex file.
- It contains details about the document such as Document class, author name, title etc
Body :
- In the body part of a latex document, sections, tables, mathematical equations, graphs etc can be included
- All the contents of the document are within a ‘begin{document}’ and a ‘end{document}’
Some features of Latex are:
- Preparing journal articles, technical reports, technical or non-technical books, and also slide presentations.
- It provides better control over large documents containing sectioning, references, tables and figures.
- It can also be useful for preparing documents containing complex mathematical formulas.
- Generation of bibliographies and indexes is automatic in LaTeX.
- It also provides multi-lingual typesetting support.
- In a latex document we can also add graphics, artwork, and process or spot colour.
- Usage of PostScript or metafont fonts is also possible in LaTeX.
Example of a LaTeX document:
Example 1: In this example we form a simple latex in order to from latex, we used simple input format as we used in latex.
documentclass
{article}
usepackage
[T1]{fontenc}
usepackage
[utf8]{inputenc}
usepackage
{lmodern}
usepackage
{textcomp}
usepackage
{lastpage}
usepackage
[tmargin=1cm, lmargin=10cm]{geometry}
usepackage
{amsmath}
usepackage
{tikz}
usepackage
{pgfplots}
pgfplotsset
{compat=newest}
usepackage
{graphicx}
begin
{document}
normalsize
section
{The regular stuff}
label
{sec:The regular stuff}
Some text and some
textit
{italic text. }
newline
Also some crazy symbols:
$
&
#
{
}
subsection
{Incorrect math}
label
{subsec:Incorrect math}
[
2*3 = 22
]
end
{document}
Output:
Example 2: In this example we used, label, subsection in order to form a latex.
documentclass
{article}
usepackage
[T1]{fontenc}
usepackage
[utf8]{inputenc}
usepackage
{lmodern}
usepackage
{textcomp}
usepackage
{lastpage}
usepackage
[tmargin=1cm, lmargin=10cm]{geometry}
usepackage
{amsmath}
usepackage
{tikz}
usepackage
{pgfplots}
pgfplotsset
{compat=newest}
usepackage
{graphicx}
subsection
{Table}
label
{subsec:Table}
begin
{tabular}{rc|cl}
hline
a&b&c&d
cline
{1
-
2}
&&&
e&f&g&7h
end
{tabular}
section
{Special features}
label
{sec:Special features}
subsection
{Correct matrix equations}
label
{subsec:Correct matrix equations}
[
begin
{pmatrix}
1&4&4
2&3&4
2&2&5
end
{pmatrix}
begin
{pmatrix}
800
30
30
end
{pmatrix} =
begin
{pmatrix}
810
60
50
end
{pmatrix}
]
end
{document}
Output :
What is Pylatex :
PyLaTeX is a Python library for creating and compiling latex documents. The goal of this library is to be easy but is also to provide an extensible interface between Python and latex.
Some features of pylatex are:
- We can access all the features of LaTeX in python using this module
- We can make documents with fewer lines of code
- Since python is a high-level language it is easier to write code for pylatex in python as compared to LaTeX
- In the above LaTeX code you must have seen that to give equations we have to calculate values and then input in LaTeX document but with python’s added functionality of performing arithmetic operations it is much easier to prepare documents
Create a Pylatex document :
- Install MikTeX and pylatex module in your system and import it into python code.
For installing MikTeX on your system, go to :https://miktex.org/download
For installing pylatex on windows based operating system, enter the following command in command prompt:
python -m pip install pylatex
- To create a document import document class from pylatex module. In latex there are different document types : article, report, letter etc. To create a document of the type article, create an object of the Document class of latex and as an argument pass ‘article’
doc=Document(documentclass='article')
- To add the necessary changes in the document such as styling or formatting, import the classes required in the python code from pylatex. To add different utilities in a latex document using pylatex the following way is feasible
from pylatex import Document, Section, Subsection from pylatex.utils import italic, bold
- To generate PDF file of the document, call the generate_pdf method of the Document class using the object of Document class and make sure to pass the name of the pdf document in its argument in this way
doc.generate_pdf("Demo_article")
Pylatex Example :
Code 1:
import
numpy as np
from
pylatex
import
Document, Section, Subsection, Tabular
from
pylatex
import
Math, TikZ, Axis, Plot, Figure, Matrix, Alignat
from
pylatex.utils
import
italic
import
os
if
__name__
=
=
'__main__'
:
image_filename
=
os.path.join(os.path.dirname(__file__),
'kitten.jpg'
)
geometry_options
=
{
"tmargin"
:
"1cm"
,
"lmargin"
:
"10cm"
}
doc
=
Document(geometry_options
=
geometry_options)
with doc.create(Section(
'The simple stuff'
)):
doc.append(
'Some regular text and some'
)
doc.append(italic(
'italic text. '
))
doc.append(
'nAlso some crazy characters: $&#{}'
)
with doc.create(Subsection(
'Math that is incorrect'
)):
doc.append(Math(data
=
[
'2*3'
,
'='
,
9
]))
with doc.create(Subsection(
'Table of something'
)):
with doc.create(Tabular(
'rc|cl'
)) as table:
table.add_hline()
table.add_row((
1
,
2
,
3
,
4
))
table.add_hline(
1
,
2
)
table.add_empty_row()
table.add_row((
4
,
5
,
6
,
7
))
doc.generate_pdf(
'full'
, clean_tex
=
False
)
Output:
Code 2:
import
numpy as np
from
pylatex
import
Document, Section, Subsection, Tabular
from
pylatex
import
Math, TikZ, Axis, Plot, Figure, Matrix, Alignat
from
pylatex.utils
import
italic
import
os
if
__name__
=
=
'__main__'
:
image_filename
=
os.path.join(os.path.dirname(__file__),
'kitten.jpg'
)
geometry_options
=
{
"tmargin"
:
"1cm"
,
"lmargin"
:
"10cm"
}
doc
=
Document(geometry_options
=
geometry_options)
a
=
np.array([[
100
,
10
,
20
]]).T
M
=
np.matrix([[
2
,
3
,
4
],
[
0
,
0
,
1
],
[
0
,
0
,
2
]])
with doc.create(Section(
'The fancy stuff'
)):
with doc.create(Subsection(
'Correct matrix equations'
)):
doc.append(Math(data
=
[Matrix(M), Matrix(a),
'='
, Matrix(M
*
a)]))
with doc.create(Subsection(
'Alignat math environment'
)):
with doc.create(Alignat(numbering
=
False
, escape
=
False
)) as agn:
agn.append(r
'frac{a}{b} &= 0 \'
)
agn.extend([Matrix(M), Matrix(a),
'&='
, Matrix(M
*
a)])
with doc.create(Subsection(
'Beautiful graphs'
)):
with doc.create(TikZ()):
plot_options
=
'height=4cm, width=6cm, grid=major'
with doc.create(Axis(options
=
plot_options)) as plot:
plot.append(Plot(name
=
'model'
, func
=
'-x^5 - 242'
))
coordinates
=
[
(
-
4.77778
,
2027.60977
),
(
-
3.55556
,
347.84069
),
(
-
2.33333
,
22.58953
),
(
-
1.11111
,
-
493.50066
),
(
0.11111
,
46.66082
),
(
1.33333
,
-
205.56286
),
(
2.55556
,
-
341.40638
),
(
3.77778
,
-
1169.24780
),
(
5.00000
,
-
3269.56775
),
]
plot.append(Plot(name
=
'estimate'
, coordinates
=
coordinates))
with doc.create(Subsection(
'Cute kitten pictures'
)):
with doc.create(Figure(position
=
'h!'
)) as kitten_pic:
kitten_pic.add_image(image_filename, width
=
'120px'
)
kitten_pic.add_caption(
'Look it's on its back'
)
doc.generate_pdf(
'full'
, clean_tex
=
False
)
Output :
Like Article
Save Article
Vote for difficulty
Current difficulty :
Basic